`
`Symantec 1008
`IPR of U.S. Pat. No. 7,757,298
`
`
`
`5,696,822
`Page 2
`
`OTHER PUBLICATIONS
`
`ThunderBYTE B.V., “User Manual”, 1995, pp. i—l91,
`Wijchen, The Netherlands.
`“Virus Infection Techniques: Part 3”, Wrus Bulletin, 1995,
`pp. 006-007, Oxfordshire, England.
`Cohen, Frederick B., “A Short Course on Computer Virus-
`2d Ed.”, John Wiley & Sons, Inc.,, pp. 54-55, 199-209,
`1994, U.S.A.
`'
`Veldman, Frans, “Heuristic Anti—Virus Technology”, Pro-
`ceedings of the International Virus Protection and Informa-
`tion Security Council, Apr. 1, 1994.
`Wells, Joseph, “Viruses in the Wild”, Proceedings of the
`International Virus Protection and Information Security
`Council, Apr. 1, 1994.
`Gordon, Scott, “Viruses & Netware”, Proceedings of the
`International Virus Protection and Information Security
`Coucil, Mar. 31, 1994.
`Solomon, Alan, “Viruses & Polymorphism”, Proceedings of
`the International Virus Protection and Information Security
`Council, Mar. 31, 1994.
`Case, Ton‘, “Viruses: An Executive Brief’, Proceedings of
`the International Virus Protection and Information Security
`Council, Mar. 31, 1994.
`
`Skulason, Fridrik, “For Programmers”, Virus Bulletin, Jul.
`1990, pp. 10-11, Oxon, England.
`4 “Automated Program Analysis for Computer Virus Detec-
`tion”, IBM Technical Disclosure Bulletin, Vol. 34, No.2, Jul.
`1991, pp. 415—416.
`“Artificial Immunity for Personal Computers”, IBM Tech-
`nical Disclosure Bulletin, vol. 34, No. 2, Jul. 1991, pp.
`150-154.
`
`Marshall, G., “Pest Control”, LAN Magazine, Jun. 1995, pp.
`54-67.
`‘
`
`Gotlieb, L., “End Users and Responsible Computing”,
`CMA—the Management Accounting Magazine, vol. 67, No.
`7, Sep. 1993, pp. 13.
`Karney, J., “Changing the Rules on Viruses”, PC Magazine,
`vol. 13, No. 14, Aug. 1994, pp. NE36.
`Schnaidt, P., “Security”, LAN Magazine, vol. 7, No. 3, Mar.
`1992, pp. 19.
`“UK—Sophos Intros Unix Virus Detection Software Jan. 26,
`1995”, Newsbytes News Network, Jan. 26, 1995.
`“Anti—V1rus Company Claims Polymorphic Breakthrough
`Jul. 10, 1992”, Newsbytes News Network, Jul. 10, 1992.
`“LAN Buyers Guide: Network Management”, LAN Maga-
`zine, vol. 7, No. 8, Aug. 1992, pp. 188.
`
`000002
`
`000002
`
`
`
`U.S. Patent
`
`Dec. 9, 1997
`
`Sheet 1 of 4
`
`5,696,822
`
`:03\\«
`
`:0:
`
`\.+mum%oE\N:
`
`
`
`«gnu:.....%a.C
`
`..mm....mm
`
`ON“
`
`QNN
`
`.%...mm
`
`.o.\.8\_+mwbm.mmoE\NE
`
`Q:
`
`%..mm
`
`.n\\8mwwm.muoE\N:
`
`000003
`
`
`
`0..9...mmN
`
`
`
`m..03
`
`as
`
`Q3
`
`Q2
`
`mmmm0:mm
`
`5..~..u\l
`
`000003
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Dec. 9, 1997
`
`Sheet 2 of 4
`
`5,696,822
`
`Emulation Module
`
`011100110077001007007700107770000001070111000701077707070007701
`
`011007077707070100071010700707001070707777000001107070000070707
`
`071000710017001001070701777000007107070707070000170701000001010 01110077001700I007007100T07770007001700170010000771700770171700
`
`
`
`FIG. 3
`
`\224
`
`000004
`
`000004
`
`
`
`U.S. Patent
`
`Dec. 9, 1997
`
`Sheet 3 of 4
`
`5,696,822
`
`Entry Point
`SFmrr.'
`
`canning
`Phase
`
`Prepare Virtual
`Machine (Emulator)
`
`Load Next File
`
`Examine File for
`
`Static Exclusions
`
`420
`
`All
`
`Viruses
`
`Excluded?
`
`N0
`
`Fetch Instruction
`
`Check Off Viruses
`V That Do Not Use
`lnstruction
`
`Emulate instruction
`
`
`
`Tag Virtual Memory
`Page Accessed
`
`438
`
`Proceed
`A
`w/Emulation ?
`
`
`4 70
`
`4 74
`
`5—0—0
`
`440
`
`
`
`Free
`
`File l8 Infection
`
`424
`
`428
`
`430
`
`4.34
`
`450
`
`N0
`
`String Scanning
`
`FIG. 4A
`
`000005
`
`000005
`
`
`
`U.S. Patent
`
`Dec.9, 1997
`
`Sheet 4 of 4
`
`5,696,822
`
`454
`
`_4_5_0_
`
`Access Next
`Tagged Data
`
`First Byte of Next
`Word
`
`
`
`
`Last
`Byte of
`Tagged Page?
`
`
`
`YES
`
`474
`
`4.90
`
`
`
`Viral Signature
`Match?
`
`
`
`Ind,-Cate File
`infected
`
`488
`
`Load Next File For
`Analysis
`
`File is Uninfected
`N0
`480
`
`
`
`
`
`Last
`Byte of
`
`Tagged Page?
`
`432
`
`
`YES
`
`Return to
`
`Emulation Loop
`
`
`
`FIG. 4B
`
`000006
`
`
`
`000006
`
`
`
`1
`POLYMORPHIC VIRUS DETECTION
`MODULE
`
`BACKGROUND OF THE INVENTION
`
`1. Technical Field
`
`This invention relates to the field of computer viruses, and
`in particular to methods and systems for detecting polymor-
`phic Viruses.
`2. Background Art
`Polymorphic viruses are a type of computer virus
`designed to evade detection by infecting each new file with
`a mutated version of the virus. By providing each newly
`infected file with viral code having a different appearance,
`polymorphic viruses frustrate most standard virus—detection
`schemes, which rely on some type of suing scanning to
`identify computer viruses in a file.
`Polymorphic viruses comprise a static virus body and a
`mutation engine. In the most common polymorphic viruses,
`the virus does not mutate. Rather,
`the mutation engine
`generates a virus decryption routine (polymorphic decryp-
`tion loop) and uses the dual of this routine to encrypt the
`static virus body and the mutation engine. The new decryp-
`tion routine and the newly encrypted virus body are then
`inserted into the host file. Common mutation strategies
`employed by the mutation engine include reordering of
`instructions, substituting equivalent instructions or equiva-
`lent sequences of instructions, inserting random “garbage”
`instructions (which have no efl'ect on the virus
`functionality), interchanging function calls, in-line code,
`JMP instructions, and the like, and using equivalent registers
`interchangeably.
`
`Thus far, the most successful technique for detecting
`polymorphic viruses has been cue-directed program emula-
`tion (CDPE). CDPE methods assumes that the polymorphic
`code contains at least one section of machine code, the static
`viral body, that is consistent from generation to generation.
`CDPE methods also assume that when executed the decryp-
`tion routine of the polymorphic virus deterministically
`decrypts the encrypted static virus body and transfers control
`to the static virus body when decryption is complete. The
`strategy employed by CDPE methods is to emulate the
`polymorphic virus until it has decrypted itself and then
`analyze the decrypted virus body using standard scanning
`techniques.
`CDPE virus detection systems comprise a scanner
`module, a CPU emulator (80><86), a set of virus signatures,
`and an emulation control module. The scanner module
`locates a file’s entry point and the CPU emulator performs
`a limited emulation of the file’s machine code under cont1'ol
`of the emulation control module. Emulation proceeds until
`the emulation control module believes either that the virus is
`fully decrypted or that the file is not infected with a virus, at
`which point suing scanning for virus signatures commences.
`The CDPE emulation control module examines each
`emulated instruction with the aid of certain heuristics to
`determine whether the instructions being emulated are likely
`to be part of a polymorphic decryption loop or a normal
`program. For example, certain sequences of instructions are
`frequently found in polymorphic decryption loops. These
`instruction sequences are referred to as “boosters” since they
`indicate to the emulation control module that it is seeing a
`potential decryption loop and should continue emulating
`instructions. Other sequences of instructions are rarely
`found in decryption loops. These instruction sequences are
`referred to as “stoppers” since they indicate to the emulation
`
`10
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`5,696,822
`
`2
`
`control module that the instructions are probably not from a
`virus decryption loop. Stoppers may be present if the host
`file is not infected or if the emulation has fully decrypted the
`static virus body. In the latter case, the static virus body, like
`any other program, may use any instructions supported by
`the processor architecture. In addition, stoppers may be
`present if a virus designer has included them in a decryption
`loop to foil CDPE detection methods.
`CDPE based methods employ additional heuristics to
`determine what the detection of various stoppers and boost-
`ers indicates about the code being emulated. For example, if
`a number of stoppers have been found prior to the detection
`of any boosters, the emulation control module will likely
`decide that the host file is uninfected. On the other hand, if
`one or more stoppers are detected following detection of a
`number of boosters, the emulation control module will likely
`decide that the polymorphic loop has been fully decrypted to
`reveal the static virus body. In this case, virus scanning will
`proceed.
`The selection of boosters and stoppers included in the
`emulation control module can have a substantial impact on
`the speed and accuracy with which the CDPE system detects
`viruses. Ideally, stoppers and boosters are selected to work
`accurately for all known polymorphic viruses. However, it
`may not be possible to find a set of such heuristics that does
`not significantly slow virus scanning. Stoppers and boosters
`useful for detecting several polymorphic viruses may actu-
`ally prevent the detection of other polymorphic viruses, as
`for example, where a virus writer includes a standard stopper
`in polymorphic loop code to confuse CDPE modules. In
`general, any change in the stoppers or boosters used must be
`accompanied by extensive regression testing to insure that
`previously detected viruses are not missed using the new
`heuristics. Since new polymorphic viruses are continually
`being developed, the time consuming and awkward selec-
`tion and regression testing of new combinations of stoppers
`and boosters can not be avoided.
`
`Thus, there is a need for polymorphic virus detection
`systems that can be readily expanded to cover newly dis-
`covered viruses, without need for extensive regression test-
`ing and modification of the heuristics of the emulation
`control module. In addition, the system should be able to
`provide accurate results without emulating unnecessarily
`large numbers of instructions.
`SUMMARY OF THE INVENTION
`
`The present invention is a polymorphic anti-virus module
`or PAM (200) for detecting polymorphic viruses (150) using
`mutation-engine specific information for each known poly-
`morphic virus rather than heuristic stopper and booster code
`sequences. The PAM system (200) comprises a CPU emu-
`lator (210) for emulating the target program, a virus signa-
`ture scanning module (250) for scanning decrypted virus
`code, and an emulation control module (220), including a
`static exclusion module (230) and a dynamic exclusion
`module (240), for determining how long each target file is
`emulated before it is scanned. The emulation control module
`(220) also includes data (222) specific to each known
`polymorphic virus (150) ‘and organized in a format that
`facilitates comparison with target files being tested for
`infection. This data (222) includes instruction/interrupt
`usage profiles (224) for the mutation engines (162) of the
`known polymorphic viruses (150), as well as size and target
`file types (226) for these viruses. The emulation control
`module (220) also includes a table (228) having an entry for
`each known polymorphic virus (150) which can be flagged
`when characteristics inconsistent with the polymorphic virus
`are detected.
`
`OOOOO7
`
`000007
`
`
`
`5,696,822
`
`10
`
`20
`
`25
`
`30
`
`35
`
`3
`In accordance with the present invention, the static exclu-
`sion module (230) examines the gross characteristics of the
`target file for attributes that are inconsistent with the muta-
`tion engine specific data for known polymorphic viruses
`(150). These characteristics are the type of target file, the
`size of the target file’s load image, the presence of certain
`instructions at the file entry point, and the distance between
`the file entry point and the end of the load image. The last
`characteristic is useful because most viruses append them-
`selves to the files they infect. In some cases, the static
`exclusion module (230) allows certain target files to be
`identified as infected without any emulation.
`The dynamic exclusion module (240) examines the
`instruction/interrupt usage profiles (224) of each known
`polymorphic virus (150) as each instruction is fetched for
`emulation. The instruction/interrupt usage profiles (224)
`indicate which polymorphic viruses (150) employ mutation
`engines that do not use the fetched instruction in decryption
`loops they generate, and the emulation control module (220)
`flags these viruses. The emulation control module (220)
`continues until all mutation engines have been flagged or
`until a threshold number of instructions have been emulated.
`The flagging technique implemented by the dynamic exclu-
`sion module (240) determines when emulation has pro-
`ceeded to a point where at
`least some code from the
`decrypted static virus body (160) may be scanned and
`substantially reduces the number of instructions emulated
`prior to scanning the remaining target files without resort to
`booster or stopper heuristics.
`It is not always necessary to fully decrypt the static virus
`body (160) to identify the underlying virus. In the preferred
`embodiment of the invention, the emulation control module
`(220) tracks those parts of virtual memory modified during
`emulation and periodically interrupts the emulation process
`to call the scanning module (250). The scanning module
`(250) tries to identify the virus type from the portion of
`decrypted static virus code (160). In order to speed up the
`process, the scanning module (250) implements a coarse
`scan of tagged memory locations to identify data bytes most
`likely to be associated with decrypted static virus code (virus
`signatures). It implements a more detailed binary search
`process only when selected bytes are encountered during the
`coarse scan. This approach greatly speeds up scanning ,
`without decreasing the accuracy of the scanning module
`(250). When code matching one of the viral signatures is
`identified, the PAM system (200) signals to the host com-
`puter that an infected file has been located.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`45
`
`FIGS. 1A—1C are schematic representations of the load
`images of an uninfected computer file, a file infected by a
`virus, and a file infected by a polymorphic virus, respec-
`tively.
`FIG. 2 is a block diagram of a Polymorphic Anti-virus
`Module (PAM) in accordance with the present invention.
`FIG. 3 is an example of an instructionlinterrupt usage
`profile employed in the emulation control module (220) of
`the present invention.
`FIG. 4A is a flowchart of the emulation process imple-
`mented by an emulation control module (220) in accordance
`with the present invention.
`FIG. 4B is a flowchart of the scanning process imple-
`mented by a scanning module (250) in accordance with the
`present invention.
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIJVIENTS
`
`50
`
`55
`
`65
`
`Computer viruses infect a variety of files in a number of
`diiferent ways. In the DOS environment, computer viruses
`
`4
`have been used to infect three different types of executable
`files: COM files, SYS files, and EFGE files. A common
`feature of these files is that at some point after loading,
`control of the computer is passed to the program code stored
`in the file. Computer viruses infect these executable files by
`attaching themselves to the file and modifying the machine
`language at the file entry point to transfer control to the virus
`rather than to the executable file. In order to camouflage
`their presence, computer Viruses typically return control to
`the infected executable file once they have run.
`Referring to FIGS. 1A and 1B, there are shown executable
`images 100, 100' of an EXE file before and after infection,
`respectively, by a virus 130. Infection modes for EXE,
`COM, and SYS files are discussed in greater detail in
`Nachenberg, A New Technique For Detecting Polymorphic
`Computer Viruses, Master Thesis, University of California
`at Los Angeles (1995), which is hereby incorporated by
`reference. Executable image 100 comprises a header 110 and
`program code 120. Header 110 includes a signature (MZ),
`size field, a code segment field CS, an instruction pointer
`field IP, a stack segment field SS, and a stack pointer field SP.
`MZ indicates the file type (EXE in this case) and the
`following field specifies the size of executable image 100.
`CS and IP specify an entry point 122 of program code 120,
`and SS and SP point to the end of program code 120, where
`a stack (not shown) may be generated
`Upon infection by computer virus 130, header 110' of
`executable image 100' is modified so that size field equals
`the size of executable image 100 incremented by the size of
`computer virus 130. In addition, computer virus 130 has
`replaced CS, IP of image 100 with CS’, 1P‘ in image 100‘,
`CS‘, I? point to an entry point 132 of virus 130 rather flran
`entry point 122 of program code 120. Similarly, computer
`virus 130 has replaced SS, SP of image 100 with SS’, SP‘ in
`image 100', which point to the end of virus 130. In order to
`return control of the computer to program code 120 follow-
`ing execution of virus 130, CS, IP, SS, and SP of uninfected
`image 100 are retained by virus 130.
`Computer viruses 130 which are added to EXE, COM, or
`SYS files in the manner of FIG. 1B are relatively easy to
`detect. A virus detection program need only scan executable
`image 100' for code segments associated with known viruses
`130. These code segments, known as virus signatures, are
`code segments unique to different viruses, and their presence
`in an executable image 100' is taken as a clear indication that
`the corresponding file has been infected. A number of
`methods are available for scanning executable images 100'
`for virus signatures. Diiferent viruses 130 may implement a
`number of strategies to hide their presence in an executable
`image 100'.
`One of the most successful strategies is that implemented
`by polymorphic viruses, which include a mutation engine
`that encrypts a static virus body according to a different
`(mutated) encryption key with each new infection. The
`encrypted virus is appended to the image with a mutated
`decryption routine, which decrypts the encrypted virus to
`reveal the static virus body only when the file is executed
`The new appearance presented by such polymorphic viruses
`on each infection frustrate those detection methods which
`would simply scan images 100' for vital signatures.
`Referring now to FIG. 1C, there is shown an executable
`image 100" infected by a polymorphic virus 150. Polymor-
`phic virus 150 comprises a static virus body 160 including
`a mutation engine 162, both of which are shown batched in
`the FIG. 1C to indicate their encrypted state. On infection,
`mutation engine 162 generates a variable encryption routine
`
`000008
`
`000008
`
`
`
`5
`
`6
`
`5,696,822
`
`(not shown) that encrypts static virus body 160 (including
`mutation engine 162) to prevent detection of polymorphic
`virus 150 by conventional scanning techniques. A decryp-
`tion routine 164, which is the dual of the encryption routine,
`is prepended to encrypted static virus body 160. When
`image 100" is executed, a decryption routine 164 alecrypts
`and passes control to static virus body 160, which employs
`the CPU of the host computer to attach itself to other files
`and implement whatever mischief its designer intends.
`One anti-vims detection scheme designed specifically for
`polymorphic viruses 150 is Cue Directed Program Emula-
`tion (CDPE). CDPE operates by emulating a target image
`100" or file for a sufficient number of instructions to allow
`a polymorphic virus 150 present in image 100" to decrypt
`itself and reveal its static virus body 160. Static virus body
`160 is then scanned to identify the type of virus present and
`steps are taken to deactivate it. As noted above, CDPE
`methods rely on the detection of prescribed stopper and
`booster code segments as a target file is emulated in order to
`determine whether an encrypted static virus body 160' is
`present, and if so, to determine whether it has been at least
`partially decrypted. The problem with this approach is that
`stopper and booster segments must be selected carefully and
`thoroughly tested in order to detect viruses accurately and
`completely. Stoppers and boosters added to detect new
`viruses require thorough regression testing to insure that
`they do not interfere with the detection of other polymorphic
`viruses. In short, stoppers and boosters that will work
`effectively with all polymorphic viruses must be identified,
`even as virus designers use these same stopper and booster
`heuristics to better camouflage their creations.
`In addition to the problems posed by any changes in the
`stopper, booster heuristics, CDPE emulation is done by
`virtual machines in order to isolate potentially infected files
`from the actual CPU and memory of the host computer.
`These virtual machines tend to operate slowly relative to the
`actual CPUs, and since each file must be checked, virus
`checking can be a very time consuming process. The speed
`of these programs is slowed further as more complicated
`heuristics are developed to detect polymorphic viruses.
`Referring now to FIG. 2, there is shown a block diagram
`of a polymorphic anti-virus module (PAM) 200 in accor-
`dance with the present invention. PAM 200 comprises an
`emulation module 210, an emulation control module 220,
`and a scanning module 250. As in CDPE systems, emulation
`module 210 allows PAM 200 to emulate a target file without
`allowing the target file to interact with either the actual CPU
`or memory of the host computer. Scanning module 250
`includes virus signatures 252 for identifying polymorphic
`viruses 150 and a scanning engine 254 for eflicienfly search-
`ing decrypted virus code for these signatures. The scanning
`engine 254 is discussed in greater detail below in conjunc-
`tion with FIG. 4B.
`
`Emulation control module 220 comprises virus profile
`data 222, a static exclusion module 230, and a dynamic
`exclusion module 240, which combine to substantially
`reduce the number of file instructions that must be emulated
`in order to determine whether a target file is infected by a
`virus. Virus profile data 222 comprises an instruction]
`interrupt usage profile 2% for each known polymorphic
`virus 150 as well as data on the sizes of known polymorphic
`viruses 150 and type of target files infected by each (size/
`type data 226). Size/Iype data 226 is accessed by static
`exclusion module 230 prior to emulation to eliminate certain
`polymorphic viruses 150 from consideration, and
`instruction/interrupt usage profiles 224 are accessed by
`dynamic exclusion module 240 during emulation to deter-
`
`10
`
`20
`
`25
`
`30
`
`35
`
`45
`
`50
`
`55
`
`65
`
`mine whether the emulated code may be part of a vims
`decryption loop. Emulation control module 220 also
`includes a table 228 of all known polymorphic viruses 150
`which is initialized with all viruses 150 flagged As each
`virus 150 is eliminated from consideration by static or
`dynamic exclusion modules 230, 240, respectively, the cor-
`responding flags are reset to preclude further consideration
`of the virus.
`
`For example, gross features of executable image 100 that
`are inconsistent with various polymorphic viruses 150 allow
`the static exclusion module 230 to rule out infection of a
`target file 100 by these polymorphic viruses before any
`emulation is done. If features inconsistent with a polymor-
`phic virus 150 are detected in target file 100, the associated
`flag in table 228 is reset and it is excluded from further
`consideration during the subsequent emulation phase. If the
`gross features of target file 100 are inconsistent with infec-
`tion by all known polymorphic virus 150, no emulation is
`required to determine that target file 100 is uninfected, and
`the next target file may be considered. More often, analysis
`by static exclusion module 230 allows only some of poly-
`morphic viruses ISO to be excluded from further consider-
`ation.
`
`Data on polymorphic viruses 150 considered by static
`exclusion module 230 are: (1) the type of target file each
`known polymorphic virus 150 is designed to attack; (2) the
`minimum size of the load image of each polymorphic virus
`150; (3) whether a polymorphic virus 150 uses a JMP
`instruction as the first instruction of in target COM file; and
`(4) the maximum size of the load image of each polymorphic
`virus 150. In order to take advantage of this data, static
`exclusion module 230 determines: the target file type and
`load image size for a target file being analyzed In addition,
`if a COM target file is being analyzed, static exclusion
`module 230 determines its first instruction, and if an EXE
`target file is being analyzed, static exclusion module 230
`determines the distance between the entry point and end of
`the load image. Theses gross characteristics and their rela-
`tionship to featmes of known polymorphic viruses 150 are
`considered below.
`
`Type of Executable File Targeted
`Dilferent viruses infect different executable file formats.
`Some infect only COM files, some infect only EXE files, and
`some infect both COM and EXE‘files. Very few viruses
`infect SYS files but some of these can infect EXE or COM
`files as well. Consequently, if target file is an EXE file, all
`polymorphic viruses 150 that attack only COM files or SYS
`files may be excluded from further consideration in the
`analysis of target file 100. In this case, flags are reset in table
`228 for each of polymorphic viruses 150 excluded by the
`type of target file 100, and subsequent analysis of target file
`100 considers only unexcluded polymorphic viruses 150.
`Minimum Size of Polymorphic Virus
`Depending on the encryption routine employed, polymor-
`phic viruses 150 may generate executable images having a
`range of sizes. However, a minimum size for the executable
`image of each polymorphic virus 150 is provided by unen-
`crypted static virus body 160, including mutation engine
`162. Consequently, each polymorphic virus 150 having an
`executable image that is larger than the executable image
`100 of the target file being analyzed may be excluded from
`further consideration in that analysis.
`JMP Instruction Usage
`Many polymorphic viruses 150 that infect COM files do
`so by appending themselves to the COM file and inserting a
`JMP instruction at
`the entry point of the COM file.
`
`000009
`
`000009
`
`
`
`5,696,822
`
`7
`Consequently, when static exclusion module 230 examines
`a COM file and determines that the first instruction is not a
`JMP instruction, each polymorphic virus 150 that employs
`such an instruction at the entry point of infected COM files
`may be excluded from further consideration.
`Entry Point Distance in EXE Files
`Polymorphic viruses 150 that infect EXE files have a
`maximum load image size. Since these viruses infect EXE
`files by appending themselves to the EEG‘: file load image,
`the distance between entry point 132 and the end of the load
`image must be less than this maximum value. Any poly-
`morphic viruses 150 having maximum sizes less than the
`distance calculated for an EXE file under analysis may be
`excluded from further consideration.
`
`This list of features examined during the static exclusion
`phase is not intended to be exhaustive. Additional features of
`polymorphic viruses 150 may also be suitable for use by
`static exclusion module 230 to exclude various polymorphic
`viruses 150 from further consideration.
`
`Typically, only a subset of known polymorphic viruses
`will be ruled out by static exclusion module 230 and some
`emulation wfll be required. In these cases, dynamic exclu-
`sion module 240 is implemented to initiate and control
`emulation module 210. During emulation, emulation control
`module 220 instructs emulation module 210 to fetch an
`instruction from load image 100. As each instruction is
`fetched, emulation control module 220 compares the fetched
`instructions with an instruction/interrupt usage profile 224
`for each known polymorphic virus 150. For each polymor-
`phic virus 150 that does not implement the fetched instruc-
`tion as indicated by its instruction/interrupt usage profile
`224, the corresponding flag in table 228 is reset to exclude
`the polymorphic virus from further consideration. This pro-
`cess continues with each succeeding instruction until all
`polymorphic viruses 150 have been excluded. Alternatively,
`the emulation phase for a target file may be stopped or
`suspended when one of two other conditions occurs. These
`conditions are discussed in greater detail below, in conjunc-
`tion with FIG. 4A.
`
`Referring now to FIG. 3, there is shown an example of an
`instruction/interrupt usage profile 224 employed by emula-
`tion control module 220 to detect a corresponding polymor-
`phic virus 150. Instructionlinterrupt usage profiles 224 are
`made possible by the fact that mutation engines 162 of
`lmown polymorphic viruses 150 do not use the entire
`instruction set available for various processor architectures.
`For example, the 80x86 instruction set allows for variable
`length instructions. However, in most cases the first byte of
`each instruction determines its basic functionality, providing
`256 possible basic instruction types. Mutation engines 162
`typically employ substantially fewer than 256 instruction
`types of the 80x86 instruction set.
`Referring still to FIG. 3, each bit of instruction/interrupt
`usage profile 224 corresponds to a diiferent possible instruc-
`tion type supported by the 80><86 architecture. A 1 indicates
`that mutation engine 162 of polymorphic virus 150 uses the
`corresponding instruction in its decryptors, and a 0 indicates
`that the instr'uction is not used by mutation engine 162. For
`example, the mutation engine employed by the Everfire
`polymorphic virus uses eight different instruction types in its
`decryptors, while the DSCE mutation engines uses 190
`different instruction types in its decryptors. Consequently,
`instruction usage profile 224 for the Everfire polymorphic
`virus includes only eight ones in the bits corresponding to
`these instructions. On the other hand, 190 of 256 bits of
`instruction usage profile 224 fro the DSCE polymorphic
`virus are ones, making this virus more diflicult to detect.
`
`8
`The instructions/intenupts used by the mutation engine of
`a polymorphic virus may be determined by infecting a large
`number of files with the polymorphic virus and analyzing
`the decryption loops generated in the infected files. Virus
`infection may be done automatically under software control,
`and the resulting polymorphic decryption loops may like-
`wise be analyzed automatically to generate instruction!
`interrupt usage profile 224 appropriate for the virus’ muta-
`tion engine. Further, since each polymorphic virus 150 is
`tested with data specific to its mutation engine 162, there is
`no need for regression testing when a new instruction!
`interrupt usage profile is added to emulation control module
`220.
`
`Polymorphic viruses 150 typically have no reason to use
`interrupts in their decryption loops, and consequently, these
`were used as “stoppers” by CDPE anti-virus program devel-
`opers. Not surprisingly, virus developers began to include
`interrupts in their decryption loops specifically because
`conventional CDPE programs would interpret their presence
`as an indication that the associated code was not part of a
`decryption loop. In the present invention, emulation control
`module 220 treats detection of an interrupt in a manner
`similar to detection of any instruction, to eliminate from
`further consideration each polymorphic virus 150 that does
`not employ such an interrupt in its decryption loop.
`Emulation control module 220 compares the instructions/
`interrupts fetched by emulator module 210 with the corre-
`sponding entry in instruction/interrupt usage profile 224 of
`each polymorphic virus 150 still under active consideration.
`When emulator 210 fetches an instruction (or an interrupts)
`that is not employed by one of polymorphic viruses 150 still
`being considered, the corresponding flag in table 228 is reset
`indicating that the virus should no longer be considered in
`the emulation phase of PAM 200.
`Typically, Emulation module 210 eventually retrieves an
`instruction that
`is not implemented by the last actively
`considered polymorphic virus 150. This indicates either that
`(1) target file 100 in not infected with any of polymorphic
`viruses 150 or (2) one of polymorphic viruses 150 has been
`decrypted to reveal static virus body 160, which like any
`program can employ any of the instructions supported by the
`CPU architecture. Most mutation engines 162 can be elirni-
`nated from consideration on one pass through the decryption
`loop of polymorphic virus 150. However, in order to elimi-
`nate the possibility of getting trapped in an infinite loop or
`spending too much time in the emulation phase, an upper
`limit may be set for the number of instructions to be
`emulated. In the preferred embodiment of the invention,
`emulation control module 220 terminates the emulation
`phase when either all polymorphic viruses 150 have been
`excluded or 1.5 million instructions have been executed.
`
`Once the emulation phase has been terminated, scanning
`can begin on decrypted static virus body 160 or at least those
`parts decrypted by the first 1.5 million instructions. In order
`to facilitate scanning of static virus body 160, emulation
`control module 220 keeps track o