`Arnold et al.
`
`[19]
`
`[11] Patent Number:
`
`5,440,723
`
`[45] Date of Patent:
`
`Aug. 8, 1995
`
`USOOS440723A
`
`[54] AUTOMATIC IMMUNE SYSTEM FOR
`COMPUTERS AND COMPUTER
`NETWORKS
`
`[75]
`
`Inventors: William C. Arnold, Mahopac; David
`M. Chess, Mohegan Lake; Jeffrey O.
`Kephart, Yorktown Heights; Steven
`R. White, New York, all of N.Y.
`International Business Machines
`Corporation, Armonk, N.Y.
`
`[73] Assignee:
`
`[21] Appl. No: 4,872
`
`[22] Filed:
`
`Jan. 19, 1993
`
`Int. Cl.6 .............................................. GOéF 11/00
`[51]
`[52] US. Cl. .................................... 395/181; 395/700;
`395/183.09; 395/183.14
`Field of Search ................. 395/575; 371/165, 19,
`371/112, 82
`
`[58]
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`.......................... 371/1675
`5,062,045 10/1991 Janis et a1,
`395/575
`5,084,816
`1/1992 Boese et al.
`5,121,345
`1/1992 Lentz ............
`.:. 364/550
`
`4/1993 Hamilton et al,
`.
`5,200,958
`371/165
`
`...... 371/19
`5,218,605
`1/1993 Low et a1.
`.. 371/165
`5,255,208 10/1993 Thakore et a].
`
`5,278,901
`1/1994 Shieh et al.
`380/4
`
`3/1994 Ohnishi et al.
`371/16.5
`5,291,590
`
`3/1994 Clark ............... 371/19
`5,297,150
`
`395/575
`5,319,776
`6/1994 Hile et al.
`
`.
`N 380/4
`5,359,659 10/1994 Rosenthal
`5,361,359 11/1994 Tajallie et a1.
`...................... ’395/700
`OTHER PUBLICATIONS
`
`Qasem et a1. “AI Trends in Virus Control” 1991 IEEE
`Proc. of Southeaston pp. 99—103 vol. 1.
`Crocker et al. “A Proposal for a Verification-Based
`Virus Filler” 1989 IEEE Symposium on Security &
`Privacy pp. 319~324.
`Kephort et a1. “Directed Graph Epidemiological Mod-
`ule of Computer Viruses” 1991 IEEE Computer Soci-
`ety Symposium on Research in Security & Privacy pp.
`343—359,
`Kumor et al. “A Generic Virus Scanner in C + +” 1992
`
`8th Ann. Computer Security Applications Proceedings
`pp. 210—219.
`Shoutkov et a]. “Computer Viruses: Ways of Reproduc-
`tion in MS DOS" 25th Ann, 1991 IEEE International
`Carnahan Conf. on Security Tech. pp. 168—176.
`(List continued on next page.)
`
`Primary Examiner—Robert W. Beausoliel, Jr.
`Assistant Examiner—Joseph E, Palys
`Attorney. Agent, or Firm—Perman & Green
`
`[57]
`
`ABSTRACT
`
`A method includes the following component steps, or
`some functional subset of these steps: (A) periodic moni~
`toring of a data processing system (10) for anomalous
`behavior that may indicate the presence of an undesir-
`able software entity such as a computer virus, worm, or
`Trojan Horse; (B) automatic scanning for occurrences
`of known types of undesirable software entities and
`taking remedial action if they are discovered; (C) de-
`ploying decoy programs to capture samples of un-
`known types of computer viruses; (D) identifying ma-
`chine code portions of the captured samples which are
`unlikely to vary from one instance of the virus to an-
`other; (E) extracting an identifying signature from the
`executable code portion and adding the signature to a
`signature database; (F) informing neighboring data pro-
`cessing systems on a network of an occurrence of the
`undesirable software entity; and (G) generating a dis-
`tress signal, if appropriate, so as to call upon an expert
`to resolve difficult cases. A feature of this invention is
`the automatic execution of the foregoing steps in re—
`sponse to a detection of an undesired software entity,
`such as a virus or a worm, within a data processing
`system. The automatic extraction of the identifying
`signature, the addition of the signature to a signature
`data base, and the immediate use of the signature by a
`scanner provides protection from subsequent infections
`of the system, and also a network of systems, by the
`same or an altered form of the undesirable software
`entity.
`
`46 Claims, 7 Drawing Sheets
`
`neutron—srsumu_IIALousIMAM
`
`ms sonnets)
`
`
`
`
`
`IIZNTIFY Imam
`cm: ms of
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0001
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0001
`
`
`
`5,440,723
`Page 2
`
`OTHER PUBLICATIONS
`S. W. Shieh et al. “A Pattern—Oriented Intrusion—De-
`tection Model and its Applications”, Proceedings of the
`1991 IEEE Computer Society Symposium on Reserach
`and Privacy, pp. 327—342.
`H. S. Javitz et al. “The SRI IDES Statistical Anomaly
`Detector", Proceedings of the 1991 IEEE Computer
`Symposium on Research in Security and Privacy, pp.
`316—326.
`
`W. Arnold et a]. “System for Detecting Undesired Al-
`teration of Software”, IBM TDB, vol. 32, No. 11, Apr.
`1990, pp. 48—50.
`S. M. Katz, “Estimation of Probabilities from Sparse
`Data for the Language Model Component of a Speech
`Recognizer", IEEE Trans. ASSP—35, No. 3, Mar. 1987.
`pp. 400—401.
`F. Cohen, A Short Course on Computer Viruses, ASP
`Press, Pittsburg, 1990, pp. 9—15.
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0002
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0002
`
`
`
`US. Patent
`
`Aug. 8, 1995 4
`
`Sheet 1 of 7
`
`5,440,723
`
`I2
`
`BUS
`
`NETWORK
`
`”5
`
`3.
`
`[4
`
`I6
`
`l8
`
`24
`
`28
`
`CPU m TERMINAL
`CONTROL
`36
`
`HARD DISK
`SYSTEM
`CONTROL
`
`FLOPPY DISK
`SYSTEM
`CONTROL
`
`20
`
`22
`
`26
`
`30
`
`DISPLAY
`DEVICE
`
`MANUAL
`HARD DISK
`FLOPPY DISK
`INPUT _ DRIVE
`I” E
`
`300
`
`FIG IA
`
`
`
`10
`
`CALCULATION
`
`
`EXACT- MATCH - FUZZY-MATCH
`
`
`CALCULATION
`
`FIG. 5
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0003
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0003
`
`
`
`US. Patent
`
`Aug. 8, 1995
`
`Sheet 2 of 7
`
`5,440,723
`
`_P_ROCESSORS
`
`TO OTHER
`
` alé INFECTED
`
`
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0004
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0004
`
`
`
`US. Patent
`
`Aug. 8, 1995
`
`Sheet 3 of 7
`
`5,440,723
`
`
`
`MONITOR SYSTEM FOR
`ANOMALOUS BEHAVIOR
`
`ANOM ALOUS BEHAVIOR
`
`SCAN FOR KNOWN
`DETECTED
`ALERT USER,
`
`REMOVE VIRUS
`VIRUSES
`B
`
`VIRUS
`
`A
`
`,
`I
`
`UNKNOWN VIRUS
`
`DEPLOY DECOY PROGRAMS
`T0 CAPTURE UNKNOWN
`VIRUS
`
`C
`
`IDENTIFY INVARIANT
`CODE PORTIONS OF
`VIRUS SAMPLEIS)
`
`EXTRACT SIGNATURE
`FROM CODE, ADD TO
`SCANNE R.' S DATABASE
`
`INFORM OTHER NETWORK
`PROCESSORS OF VIRUS
`INFECTION
`
`GENERATE DISTRESS
`SIGNAL, IF REQUIRED
`
`FIG. 2
`
`E
`
`F
`
`G
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0005
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0005
`
`
`
`US. Patent
`
`.Aug.8,1995
`
`Sheet 4 of 7
`
`5,440,723
`
`$42583
`
`._
`
`
`
`awkwwwmufi3.52%.3895>mamas.
`
`._._<a$88amiaos.“.0
`
`>44<32rr200
`
`n.Ohm>23m
`
`__L
`
`
`
`ImDmSZBOZX<H02n:
`
`
`
`6.1%meSu._IOF132.0502mmuwdmmw—zmDP<.—.m
`0239.ozmwmgmaammo9mzqmoomn.
`
`
`
`some
`
`omzao:n:ozn:
`
`@603v.3559...
`
`memflrDOmxw
`
`Jm3m_>DwGZ<IOZO2<0m~=>23mWUOZ<IO02.u:_2302!n:“mmqmflbowxw
`
`
`
`
`
`
`
`
`
`
`m3¥0m1022m>13<DZC.ZOOD<l_>23mMme—44:0n:asxomzo23m_QwOZ<IUn:>J<ZOZ<n:“*.._|.<O_DO_Kwn_
`
`
`
`
`
`
`
`I0_Iuhme—d.HumEEO—miukmw4<Fww
`
`Cdmsmaxwzazmzofiomuz.
`
`
`
`
` :93nooz.1:momzofimon.moootfizmeEomwmuzmwzmo2
`
`
`
`
`
`mdszmmax;
`
`mn5v5<mmo>mw>
`
`
`
` ooza:n5zfijo._551:5.sz535GMO.“—
`
`
`
`
`
`
`
`O._.vawmzkdzgmand
`
`ww<m<k<o
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0006
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0006
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Aug. 8, 1995
`
`Sheet 5 Of 7
`
`5,440,723
`
`FILTER OUT PORTIONS OF VIRUS
`THAT ARE OBSERVED TO VARY
`FROM INSTANCE TO ANOTHER
`
`
`
`
`
`
`
`FILTER OUT PORTIONS OF VIRUS
`
`
`THAT ARE INTRINSICALLY THE MOST
`LIKELY TO VARY FROM INSTANCE
`
`TO ANOTHER
`
`FIG. 4
`
`
`Is? INSTANCE
`
`IIIIIIIIIIIIIIII
`BYTE n
`
`BYTE I
`
`
`
`2nd INSTANCE
`
`IIIIIIIIIIIIIIII
`3rd INSTANCEIEIIIIIIIIIIIIII
`
`
`
`
`VARIANT
`PORTION
`
`I
`
`|"INVARIANT"
`I PORTION
`
`IVARIANT I
`I PORTION I
`
`FIG. 6A
`
`
`BYTE I
`\
`
`8E DE 8C 06 Ol 00 80 C6 8E DE 3| CO
`
`3I DB 3| C9 3I DZ CB F9 F8 FF 6F F5\
`BYTE 24
`
`FIG. SB
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0007
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0007
`
`
`
`US. Patent
`
`Aug. 8, 1995
`
`Sheet 6 of 7
`
`5,440,723
`
`SEGREGATE CORPUS
`
`INTO PROBE, TRAINING
`AND TEST SETS
`
`SELECT BYTE—STRINGS
`FROM PROBE SET
`
`USE TRAINING SET TO
`ESTIMATE PROBABILITIES
`
`COUNT FREQUENCY IN
`TEST SET '
`
`PRODUCE LIST OF
`ESTIMATE PROBABILITIES
`VS FREQUENCY
`
`DETERMINE FALSE-
`POSITIVE AND FALSE-
`REJECTION PROBABI LTY
`
`
`
`SELECT THRESHOLD
`
`G
`
`FIG. 7
`
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0008
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0008
`
`
`
`U
`
`t
`
`Aug. 8, 1995
`
`Sheet 7 of 7
`
`5,440,723
`
`S.xmoafiz
`
`Dom.om_mE9.
`tmo
`
`mka<ZQw
`
`m._.<n=oz<o
`
`amno5:33525
`$35285mommmooE0mm
`
`wmpzzwa245-208Hon
`
`
`ww_._._.=m<moma
`
`”.022.me
`
`mEEué
`
`o.»
`
`mm
`
`Sim;n.
`
`
`
`52295as:39:.
`
`
`
`pzwzoStasqmoé2:.muzzqom
`
`No
`
`mek<sz
`
`hmommm
`
`ackowqmm
`
`nomammoo
`
`mzémoma«n
`
`E_.=m<moE
`
`a$2528
`
`Scionmm
`
`
`
`3:658;.292
`
`
`
`.Qmo_._<>«5.22.me.mo
`
`¢m.mmmrk
`
`m»
`
`:8
`
`86:28
`
`
`
`‘Fmfid*om
`
`mmE.__m<moE
`
`no3mgsome
`
`
`256-22459:.
`
`
`DL'LQEJ
`
`:2:
`
`
`
`>.E..=m<momm
`
`.mozsfiwwmmin—25.5%
`8a29588".a?£8sz
`
`|
`
`
`
`wm3h<29mQ..._<>
`
`mmoom.mN.9
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0009
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0009
`
`
`
`
`
`
`
`
`
`1
`
`5,440,723
`
`AUTOMATIC IlVIlVIUNE SYSTEM FOR
`COMPUTERS AND COMPUTER NETWORKS
`
`CROSS-REFERENCE TO A RELATED PATENT
`APPLICATION
`
`This patent application is related to commonly as—
`signed U.S. patent application Ser. No. 08/004,871, filed
`Jan. 19, 1993, entitled “Methods and Apparatus for
`Evaluating and Extracting Signatures of Computer
`Viruses and Other Undesirable Software Entities”, by
`Jeffrey O. Kephart
`FIELD OF THE INVENTION
`
`This invention relates generally to digital data pro-
`cessors and, in particular, to methods and apparatus for
`providing computational integrity for digital data pro-
`cessors and networks thereof.
`
`BACKGROUND OF THE INVENTION
`
`A computer virus has been defined by Frederick B.
`Cohen as a program that can infect other programs by
`modifying them to include a, possibly evolved, version
`of itself (A Short Course on Computer Viruses, ASP
`Press, Pittsburg, 1990, page 11).
`As employed herein, a computer virus is considered
`to include an executable assemblage of computer in—
`structions or code that is capable of attaching itself to a
`computer program. The subsequent execution of the
`viral code may have detrimental effects upon the opera-
`tion of the computer that hosts the virus. Some viruses
`have an ability to modify their constituent code, thereby
`complicating the task of identifying and removing the
`virus.
`A worm is a program or collection of programs that
`can cause a possibly evolved version of itself to be exe-
`cuted on the same or possibly on a different computer.
`A Trojan Horse is a block of undesired code that is
`intentionally hidden within a block of desirable code.
`Both computer viruses, worms, and Trojan Horses
`are considered to be members of a class of undesirable
`software entities, the presence of which within a data
`processor, or network of data processors,
`is to be
`avoided so as to maintain computational integrity.
`A widely-used method for the detection of computer
`viruses is known as a virus scanner. A virus scanner
`employs short strings of bytes to identify particular
`viruses in executable files, boot records, or memory.
`The byte strings (referred to as signatures) for a particu—
`lar virus must be chosen with care such that they always
`discover the virus, if it is present, but seldom give a
`“false alarm", known as a false positive. That is, the
`signature must be chosen so that the byte string is one
`that is unlikely to be found in programs that are nor-
`mally executed on the computer. Typically, a human
`expert makes this choice by converting the binary ma-
`chine code of the virus to an assembler version, analyz-
`ing the assembler code, selecting sections of code that
`appear to be unusual or virus-like, andidentifying the
`corresponding bytes in the binary machine code so as to
`produce the signature. Wildcard bytes can be included
`within the signature to provide a match with any code
`byte appearing in a virus.
`Currently, a number of commercial computer virus
`scanners are successful in alerting users to the presence
`of viruses that are known apriori. However, conven-
`tional virus scanners are typically unable to detect the
`presence of computer viruses which they have not been
`
`10
`
`15
`
`20
`
`25
`
`3O
`
`35
`
`4o
`
`45
`
`50
`
`55
`
`65
`
`2
`programmed to detect explicitly. The problem of deal-
`ing with new viruses has typically been addressed by
`distributing updates of scanning programs and/or auxil-
`iary files containing the necessary information for iden—
`tifying the latest viruses. However, the increasing rate
`at which new viruses are being written is widening the
`gap between the number of viruses that exist and the
`number of viruses that can be detected by an apprecia-
`ble fraction of computer users. As a result, it is becom-
`ing increasingly likely that a new virus will become
`wide—spread before any remedy is generally available.
`At least one mechanism to inform other computers
`connected to a network ofa presence of a viral infection
`has been previously advanced. For example, a “net
`hormones” concept has been proposed by David Sto-
`dolsky, wherein every computer on a network keeps
`archives of all potential virus-carrying contacts with
`every other computer. When a computer finds that it is
`infected with a virus, it sends a message to every com-
`puter that it has ever contacted or been contacted by,
`which in turn send messages to all of their contactees or
`contactors, etc. However, it is believed that this ap-
`proach will result in most or all computers quickly
`becoming overloaded by infection messages from most
`of the other computers on the network.
`As such, a need has arisen to develop methods for
`automatically recognizing and eradicating previously
`unknown or unanalyzed viruses on individual comput-
`ers and computer networks. An efficient method is also
`required for informing other computers on the network
`as to the existence of a computer virus within the net~
`work.
`
`It is an object of this invention to provide methods
`and apparatus to automatically detect and extract a
`signature from an undesirable software entity, such as a
`computer virus or worm.
`It is further object of this invention to provide meth-
`ods and apparatus for immunizing a computer system,
`and also a network of computer systems, against a sub-
`sequent infection by a previously unknown and undesir-
`able software entity.
`SUMMARY OF THE INVENTION
`
`The foregoing and other problems are overcome and
`the objects of the invention are realized by a method
`that provides computational integrity for a digital data
`processor and a network of digital data processors. The
`method includes the following component steps, or
`some functional subset of these steps:
`(A) periodic monitoring of a data processing system
`for anomalous behavior that may indicate the presence
`of an undesirable informational state, such as one arising
`from the presence of an undesirable software entity,
`such as a computer virus or worm;
`(B) automatic scanning for occurrences of known
`types of undesirable software entities;
`(C) deploying decoy programs to capture samples of
`unknown types of computer viruses;
`(D) automatically identifying portions of a captured
`virus sample which are likely to remain invariant from
`one instance of the virus to another;
`(E) extracting an identifying signature from the in-
`variant portion and adding the signature to a signature
`database;
`(F) informing neighboring data processing systems
`on a network of an occurrence of the undesirable soft-
`ware entity; and
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0010
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0010
`
`
`
`3
`(G) generating a distress signal, if appropriate, so as
`to call upon an expert to resolve difficult cases.
`A feature of this invention is the automatic execution
`of the foregoing steps in response to a detection of an
`undesired software entity, such as a virus, worm, or
`Trojan Horse, within a data processing system. Further-
`more, the automatic extraction of the identifying signa-
`ture, and the addition of the signature to a signature data
`base, provides protection, or “immunity”, to subsequent
`infections of the system, and also a network of systems, 10
`by the same or an altered form of the undesirable soft-
`ware entity.
`BRIEF DESCRIPTION OF THE DRAWING
`The above set forth and other features of the inven- 15
`tion are made more apparent in the ensuing Detailed
`Description of the Invention when read in conjunction
`with the attached Drawing, wherein:
`FIG. 1a is a block diagram of a computer system that
`is suitable for use in practicing the invention;
`FIG. 1b illustrates an exemplary computer network
`that includes the computer system of FIG. la;
`FIG. 1c illustrates the exemplary computer network
`of FIG. 1b and the use of a kill signal that is transmitted
`by infected computers to neighboring computers;
`FIG. 2 is a flow chart depicting the constituent steps
`of a method of the invention;
`FIG. 3 is a more detailed flow chart depicting the
`operation of the method of the invention;
`FIG. 4 is a flow chart depicting a method for use in 30
`segregating variant virus portions from invariant virus
`portions;
`FIG. 5 is a flow chart that shows the operation of a
`statistically based computer virus signature extraction
`method;
`FIG. Ga illustrates exemplary portions of three in-
`stances of a computer virus, the three portions each
`including an invariant portion and variant portions;
`FIG. 6b illustrates an exemplary 24—byte “invariant”
`portion of a computer virus;
`FIG. 7 is a flow chart that depicts a method of select-
`ing a probability threshold used for accepting or reject-
`ing a candidate signature; and
`FIG. 8 is a block diagram of a system operable for
`executing the method of the invention.
`
`U1
`
`20
`
`25
`
`35
`
`40
`
`45
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`FIG. 1a is a block diagram of a data processing sys-
`tem 10 that is suitable for practicing the teaching of the 50
`invention. A bus 12 is comprised of a plurality of signal
`lines for conveying addresses, data, and controls be-
`tween a Central Processing Unit 14 and a number of
`other system bus units. A RAM 16 is coupled to the
`system bus 12 and provides program instruction storage 55
`and working memory for the CPU 12. A terminal con-
`trol subsystem 18 is coupled to the system bus 12 and
`provides outputs to a display device 20, typically a CRT
`monitor, and receives inputs from a manual input device
`22, typically a keyboard Manual input may also be 60
`provided from a pointing device, such as a mouse. A
`hard disk control subsystem 24 bidirectionally couples a
`rotating fixed disk, or hard disk 26, to the system bus 12.
`The control 24 and hard disk 26 provide mass storage
`for CPU instructions and data. A floppy disk control 65
`subsystem 28 bidirectionally couples one or more
`floppy disk drives 30 to the system bus 12. The floppy
`disk drive 30 works in conjunction with a removable
`
`5,440,723
`
`4
`floppy diskette 30a. A network adapter (NA) 31 is pro-
`vided for coupling the system 10 to a network.
`The components illustrated in FIG. It: may be em—
`bodied within a personal computer, a portable com-
`puter, a workstation, a minicomputer, a main frame
`computer, or a supercomputer. As such, the details of
`the physical embodiment of the data processing system
`10, such as the structure of the bus 12 or the number of
`CPUs 14 that are coupled to the bus, is not crucial to the
`operation of the invention, and is not described in fur-
`ther detail below.
`As employed herein, the informational state of the
`data processing system 10, at a given moment in time, is
`considered to be defined by the contents (bytes) of all
`memory or storage devices to which the system has
`access. These include, but are not limited to, RAM,
`registers, hard disks, floppy disks that are inserted into
`floppy disk drives, and magnetic tape that is inserted
`into a tape drive.
`The informational state history of the system 10 is
`considered to be the history of its informational state at
`all moments in time.
`For the case where the data processing system 10 (P1
`of FIG. 1b) is connected to other data processing sys-
`tems (P2—P7) via a network, such as the exemplary
`network depicted in FIG. 117, an additional data proces-
`sor (PD) may be added as a dedicated decoy program
`server, as described below.
`The method of this invention includes the following
`component steps, as illustrated in FIG. 2, or some func-
`tional subset of these steps.
`Step A: Periodic monitoring of the data processing
`system 10 for anomalous, potentially virus—like behaV»
`101'.
`
`Step B: Automatic scanning for occurrences of
`known viruses, and removal of any which are found to
`be present.
`Step C: Deployment of decoy programs to capture
`virus samples.
`Step D: Segregation of a captured virus sample into
`portions that are likely to vary from one instance of the
`computer virus to another instance, and into portions
`that are likely to be invariant from one instance to an-
`other.
`Step E: Extraction of a viral signature from the in-
`variant portion(s) and the addition of the signature to a
`signature database.
`Step F: Informing neighboring data processing sys~
`terns on a network of an occurrence of a viral infection.
`Step G: Generation of a distress signal, if appropriate,
`so as to call upon human experts to resolve difficult
`cases.
`A feature of this invention is the automatic execution
`of the foregoing steps in response to a detection of an
`undesired software entity, such as a virus or a worm,
`within the data processing system 10.
`A discussion is now made of each of the above refer—
`enced Steps A—G of FIG. 2.
`
`Step A: Anomaly Detection
`The first step in the process of this invention detects
`anomalous system behavior of a type that may indicate
`the presence of an undesirable informational state re-
`sulting from the presence of a virus or some other unde-
`sirable software entity, such as a worm or a Trojan
`Horse. The detection of anomalous behavior within a
`computer or computer network can be accomplished
`through the use of known techniques, preferably a tech—
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0011
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0011
`
`
`
`5,440,723
`
`6
`
`5
`nique that detects anomalies that may indicate a virus.
`One suitable anomaly detection technique, that uses
`patterns of activity undertaken by computational pro—
`cesses, is described in an article entitled “A Pattern—
`Orientated Intrusion-Detection Model and its Applica-
`tions”, S. W. Shieh and V. D. Gligor, Proceedings of
`the 1991 IEEE Computer Society Symposium on Re-
`search in Security and Privacy”. Another suitable tech-
`nique detects unexpected changes in executable or re-
`source files. As an example, F. Cohen, A Short Course on
`Computer Viruses, ASP Press, Pittsburg, 1990, discusses
`the use of integrity shells employing modification detec-
`tors. In this case, modification of files is detected by
`cryptographic or non—cryptographic checksums, and
`the user is notified if a modification is detected.
`The method at Step A may employ any one or a
`combination of. these or similar techniques. The end
`result is the detection by the system 10 of an anomalous
`system behavior that may be indicative of the presence
`of a computer virus, or some other undesirable software
`entity.
`The anomaly detection process or processes may be
`run continually as a background task, or periodically as
`a foreground task, depending on the desire of the user
`and/or the nature of the underlying operating system.
`
`Step B: Scan for Known Viruses
`If preliminary evidence of virus-like activity is de—
`tected, additional computational resources are devoted
`to obtaining more conclusive evidence of viral infec-
`tion. If a virus is present, there is a high probability that
`the virus is a well-known virus that can be readily de-
`tected by a virus scanner. Thus, Step A scans an infor-
`mational state history of the system, that is, all relevant
`media (e.g. files, boot records, memory), for knOWn
`viruses. In one implementation, a pre-existing Virus
`scanner is activated to search for a large set of patterns
`or signatures, each of which pertains to a different virus
`or virus family. It is highly desirable (but not essential)
`that the virus scanner be capable of detecting slight
`variations on known viruses. Many conventional scan-
`ners tend to possess this capability, to a limited extent,
`simply by virtue of the fact that they search based only
`on short contiguous strings of bytes found in the virus.
`One especially suitable virus scanner, known as VIR-
`SCAN, was available from the International Business
`Machines Corporation until recently, when it was in-
`corporated into the IBM AntiVirus program. This virus
`scanner is particularly useful for detecting variations in
`known viruses because it permits a certain number of
`mismatches between a string of bytes in a file being
`examined and the virus signature string. This capability
`reduces the chance of having to proceed to Step C
`(decoy deployment), and is even more useful in the
`event that the virus is completely unrelated to any
`known virus, as is described below with respect to Step
`E. If the anomaly is found to be due to a known virus or
`some slight alteration of it, the method proceeds to Step
`B1 where the user is alerted, and the virus removed
`(killed) by traditional methods, such as restoration from
`backup (either automatically or manually by the user)
`or disinfection (removal of the virus from all of the
`software it has infected.) In general, disinfection is only
`acceptable if the virus is found to be an exact copy of a
`known virus. This implies that the system verify the
`identification made by the virus scanner. .
`
`Step C: Decoy Deployment
`If no known virus is detected, the anomaly may be
`due to some unknown virus. In this case, the method
`employs one or more decoy programs in an attempt to
`attract and obtain one or more samples of the unknown
`virus. The decoy programs could be commercially-
`available computer programs which are installed on
`(and are capable of being executed on) the computer
`system. Preferably, however, they would be special-
`purpose programs which exist solely for the purpose of
`capturing viruses.
`For the case of a DOS-based system, a variety of
`decoy programs having *.COM and *.EXE extensions
`(also referred to as executables) are created, executed,
`read, written to, copied, etc. The use of decoy programs
`for this purpose is described by W. C. Arnold and D. M.
`Chess in “System for Detecting Undesired Alteration of
`Software”, IBM Technical Disclosure Bulletin, Vol. 32,
`No. 11, pages 48—50, April 1990.
`In operation, original copies of the decoy programs
`are stored securely, by example in encrypted form or on
`a separate server, along with checksums of the originals.
`After a specified time interval, or after other specified
`conditions are met, checksums of some or all of the
`decoy programs are taken and compared directly to
`their originals. Alternatively, the decoy programs are
`compared directly to their originals. If none of the de-
`coys are found to have changed, the user is alerted to
`the fact that an anomaly of some sort was detected, but
`that the system was unable to find conclusive evidence
`of a virus. In this case, it is desirable to maintain the
`system in a moderate state of alert for some time period,
`exercising the decoy programs at a reduced priority or
`less frequent intervals.
`It is desirable to have a large variety of decoy pro—
`grams that differ from one another in length, content,
`and name. Furthermore, the decoy programs should be
`placed in and run from various directories, particularly
`those directories in which there are modified executa—
`bles.
`Some DOS viruses are known to become resident in
`memory and are active until the computer is rebooted.
`In such a case, the virus is able to infect any executable
`at any time. Often, a resident virus will infect executa-
`bles as they are brought into memory to be executed.
`Thus, if several decoy programs are run, there is a rea-
`sonable chance that one or more decoy programs will
`be infected by such a virus. In addition to running the
`decoys, the system should perform a variety of other
`operations with the decoy programs, including copying
`them, opening them, creating them, etc. All of these
`operations increase the probability of infection by a
`virus that is resident in memory.
`However, some DOS viruses do not go resident, and
`can infect other programs only when the host program
`in which the virus is imbedded is executed. Such viruses
`typically use static criteria for selecting a program to
`infect, such as the program name or the directory in
`which the program is located. In this case, it is more
`difficult for the decoy programs to become infected. To
`increase the chance of infection, decoy programs are
`preferably placed in locations where they are more
`likely to be selected, such as the directory of the possi-
`bly-infected executable, or directories containing sys-
`tem programs (e.g. the root directory) and commonly-
`used utility programs. In order to infect the decoy pro-
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0012
`
`SOPHOS
`EXHIBIT 1008 - PAGE 0012
`
`
`
`5
`
`10
`
`15
`
`20
`
`Step D: Identification of Invariant Code Portions
`In order to maximize the likelihood that a chosen
`signature will appear in all or nearly all progeny of the
`virus, it is preferred to filter out all portions of the virus
`which are observed to or are likely to vary from one
`instance of the virus to another. To achieve this, the
`method employs one, or preferably both, of the tech-
`niques represented as Blocks A and B in FIG. 4. The
`order in which the two techniques are used can be re-
`versed, and the following description is correspond-
`ingly modified to accommodate the opposite ordering
`of the two techniques.
`If several decoy programs are found to be infected, a
`most desirable case, it is important to compare the vari-
`ous versions of the isolated virus programs with one
`another (Block A). If all versions are identical, or if 25
`there is only one version, the entire virus is considered
`to be “invariant”, and can be passed along to block B
`Otherwise, if the various isolated computer virus pro-
`grams differ, but have substantial areas in common, it
`can be assumed that the virus alters itself when it repli-
`cates. As a result, only the common areas are marked
`for further examination by the heuristics employed in
`Block B of FIG. 4. This procedure is illustrated in
`FIG. 6a, in which the variant and “invariant” por-
`tions of a virus are identified by comparing three in-
`stances of it. In FIG. 60, each byte is shown expressed
`in a hexadecimal representation, and an invariant por—
`tion is identified. The length of the invariant sequence
`of bytes is exemplary. In practice, invariant sequences
`of bytes are often considerably longer than five bytes,
`and more than one such portion typically appears
`within the virus. If there are no substantially similar
`areas,
`this may indicate the presence of a relatively
`sophisticated self—altering virus, or the presence of more
`than one virus. In this case, each of what appear to be
`different viruses are provided separately to the extrac—
`tion method, which determines one or more signatures
`for each. The system is also placed in a moderate state
`of alert, and the user is informed.
`It is noted that a situation may arise in which the
`computer is attacked by a sophisticated, unknown virus,
`every instance of which looks completely different. In
`this case, at the completion of Block A in FIG. 4 the
`virus detector may be unable able to determine whether
`there are several viruses, or one virus that appears dif-
`ferent every time it is replicated. The use of a distress
`signal, described below in relation to Step G, may be
`appropriate for this situation.
`At this point, there are one or more sections of a virus
`program (or a set of different viruses) that are marked as
`being substantially “invariant”. However, even if a
`given section of the virus program is common to all
`copies which have been obtained in Step C, this does
`not guarantee that the section will appear in its entirely
`in every instance of the virus. The purpose of Block B
`is to filter out portions of the “invariant” sections which
`by their nature are more likely to be variable. As a
`general rule, non-executable “data” portions -of pro-
`
`45
`
`30
`
`35
`
`4o
`
`50
`
`55
`
`6O
`
`65
`
`5,440,723
`
`7
`grams, the potentially-infected executable should be run
`repeatedly.
`If one or more decoy programs is subsequently found
`to have changed from the original, protected version, it
`can be assumed that the changes are due to a virus. A
`comparison of each modified decoy program with its
`corresponding uninfected version enables a copy of the
`virus to be isolated from each decoy.
`
`8
`grams, which can include representations of numerical
`constants, character strings, screen images, work areas
`for computations, addresses, etc., tend to be more likely
`to vary from one instance of the program to another
`than “code” portions, which represent machine instruc-
`tions. It may be that most or all of the “data” bytes in a
`given virus are the same in all instances of that virus, but
`without a careful analysis this is difficult to establish
`with certainty. Thus, a conservative approach assumes
`that “data” bytes may change, and therefore it is desir—
`able to disqualify them from being chosen as portions of
`candidate signatures. Perfect classification of each byte
`as being “code” or “data” is not necessary. An errone-
`ous classification of “code” bytes as “data” bytes is
`more tolerable than an erroneous classification of
`“data” bytes as “code” b