throbber
Purdue University
`Purdue e-Pubs
`
`Computer Science Technical Reports
`
`Department of Computer Science
`
`1992
`
`A Generic Virus Scanner in C++
`
`Sandeep Kumar
`
`Eugene H. Spafford
`Purdue University, spaf@cs.purdue.edu
`
`Report Number:
`92-062
`
`Kumar, Sandeep and Spafford, Eugene H., "A Generic Virus Scanner in C++" (1992). Computer Science Technical Reports. Paper 983.
`http://docs.lib.purdue.edu/cstech/983
`
`This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact epubs@purdue.edu for
`additional information.
`
`0001
`
`Blue Coat Systems - Exhibit 1047
`
`

`
`A GENERIC VIRUS SCANNER IN C++
`
`Sandeep Kumar
`Eugene H. Spafford
`
`CSD-TR-92-062
`September 17, 1992
`
`0002
`
`

`
`A Generic Virus Scanner in C++ *
`Technical Report CSD-TR-92-Q62
`
`Sandeep Kumar
`
`Eugene H. Spafford
`
`The COAST Project
`Department of Computer Sciences
`Purdue University
`West Lafayette, IN 4 7907-1398
`{kumar,spaf}@cs.purdue.edu
`
`17 September 1992
`
`Abstract
`Computer viruses pose an increasing risk to computer data
`integrity. They cause loss of valuable data and cost an enor(cid:173)
`mous amount in wasted effort in restoration/ duplication oflost
`and damaged data. Each month many new viruses are re(cid:173)
`ported. As the problem of viruses increases, we need tools to
`detect them and to eradicate them from our systems.
`This paper describes a virus detection tool: a generic virus
`scanner in c++ with no inherent limitations on the file systems,
`file types, or host architectures that can be scanned. The tool
`is completely general and is structured in such a way that it
`can easily be augmented to recognize viruses across different
`system platforms with varied file types. The implementation
`defines an abstract c++ class, virinfo, which encapsulates
`virus features common to all scannable viruses. Subclasses
`of this abstract class may be used to define viruses that infect
`different machines and operating systems. The generality of
`the mechanism allows it to be used for other forms of scanning
`as well.
`*This paper appears in the Proceedings of the 8th Computer Security Applica(cid:173)
`tions Conference, IEEE Press, 1992.
`
`1
`
`0003
`
`

`
`1 Introduction
`
`Computer viruses pose an increasing risk to computer data in(cid:173)
`tegrity. They cause loss of valuable data and cost an enormous
`amount in wasted effort to restore or recreate damaged or destroyed
`data. Each month new viruses are reported (cf. issues of The Virus
`Bulletin and Virus News and Reviews). As the number of viruses
`increases, we need tools to detect them and eradicate them from our
`systems. While the problem of detecting, without error, all viruses
`automatically is intractable[4], it is certainly feasible to detect sim(cid:173)
`ple, known viruses.
`a generic virus scanner in
`In this paper we describe a tool -
`that can run in a high integrity virtual memo:ry environ(cid:173)
`c++ -
`ment and scan for viruses in files. These files can be available
`either on the tool's host machine itself, available through a locally
`mounted file system, or visible to it through an appropriate network
`mount. For example, the testing machine could be a UNIX worksta(cid:173)
`tion accessing a DOS filesystem through its floppy disk interface or
`remote-mounted on the workstation through a network (e.g., Novell
`or PC-NFS). The rapid proliferation and use of network software in
`the PC community has already created a need for such interfaces
`whereby PC mounted file systems and file servers may be accessible
`to more powerful workstations on the same local area network.
`One common definition of a virus is as a segment of machine
`code that installs a (possibly evolved) copy of itself into one or more
`larger "host" programs[4]. 1 When the program is executed, the code
`is activated and enables further spread of the virus, or destruction
`of data, or both. The principal cause of this problem is the almost
`nonexistent controls in most PC systems that allows user programs
`to potentially gain complete control of the system. This allows virus
`code to perform any operation, and to change any code or data.
`Looking for viruses is not a simple matter of looking for extrane(cid:173)
`ous code, because it is not always obvious what is extraneous. Re(cid:173)
`cent "stealth viruses" make even this procedure difficult by ensuring
`that the original contents of an infected file are returned when its
`contents are requested as data for examination. 2 It is more reliable
`
`1other definitions may be found in the collections [7] and [10].
`2See [8] for a good description of how stealth viruses operate.
`
`2
`
`0004
`
`

`
`to test for infected files by using a system that partitions its pro(cid:173)
`cesses into distinct address spaces by a virtual memory translation.
`This way we can avoid the effects of stealth and other memory res(cid:173)
`ident viruses on the scanning procedure. Better still would be the
`use of a scanner on a completely different architecture - one that
`cannot support the execution or spread of the searched-for viruses.
`In such an environment, when a virus scanner running as a user
`program requests bytes from a file for examination, it is assured of
`the integrity of the bytes from influence by other user programs; in
`no case can an ordinary user process modify the interrupt vectors
`of devices or traps leading to system calls.
`If one is testing for viruses on a diskette using a machine that
`runs processes in their own virtual memory space, one can be rea(cid:173)
`sonably sure about the integrity of the memory. Similarly, boot
`diskettes can be just as easily tested because we do not attempt
`to boot from the diskettes and, in the process, compromise the in(cid:173)
`tegrity of the testing machine in any way. In short, by testing on a
`machine with processes running in their own virtual address space
`we ensure, with very high confidence, the integrity of the machine
`on which we are doing the testing. This is similar to doing a high
`integrity boot of a PC before scanning for viruses on it. Further(cid:173)
`more, if we are able to run our detector in a completely different
`environment from the one containing the potential viruses, those
`viruses cannot infect or interfere with our detector.
`
`2 How viruses are detected
`
`In this section we review current methods of virus detection and
`end with the conclusion that detection by signature scanning still
`remains the most simple, economical and commonly used tool for
`virus detection.
`
`2.1 Virus monitors/detection by behavioral abnormality
`
`In this approach to virus detection, the machine is booted from
`uninfected files and a virus monitor is installed that monitors vari(cid:173)
`ous activities of the machine while in day-to-day use. The program
`monitors known methods of virus activity including attempts to in-
`
`3
`
`0005
`
`

`
`feet and evade detection. Tilis may also include attempts to write to
`boot sectors, modify interrupt vectors, write to system files, etc.
`Software monitors work best when the normal or day-to-day us(cid:173)
`age characteristics of the system are vastly different from the ac(cid:173)
`tivity profile of an infected system. Tilis desirable characteristic,
`however, is not always present. If the virus is cleverly written to
`always stay within this normal profile, it may be difficult to detect
`its presence using a monitor. For monitoring to be more effective,
`users need to be better educated about the behavior and function(cid:173)
`ing of viruses. They must know how their system works so they can
`recognize suspicious activity when the software monitor fails. 3
`The chief advantage of a properly implemented monitoring tech(cid:173)
`nique is that it works for all viruses -
`the ones currently known,
`and the ones yet to be discovered. Furthermore, it can detect in(cid:173)
`fections before they occur. Unfortunately, to always detect these
`infections, the sensitivity of the monitor must be set so high that
`it may generate many false alarms from normal activity. Further(cid:173)
`more, such detectors must be installed at a low-level on the target
`machine, and must always be run; an infected detector will not be
`of practical use in preventing further infections!
`
`2.2 Detection by emulation
`
`In this scheme the program under test is emulated by the virus
`detection program, which attempts to determine the run-time be(cid:173)
`havior of the program. Tilis is different from monitoring in that
`the program is not observed while it is actually executing but is
`emulated with sample input(s). As in the case of monitors, if the
`program attempts to change the interrupt vectors or open sensitive
`files it should be deemed suspicious.
`Detection by emulation cannot be "precise." That is, we cannot
`always correctly decide whether the program behaves like a virus.
`One difficulty in emulating a program is in determining suitable
`input(s) for it and then emulating it with all the inputs to see if
`any cause the program to exhibit a virus-like behavior. Timed or
`context-sensitive behavior may be present that fools the emulator.
`
`3 [2) and [16) (for example) list practical steps that can be taken to educate users
`against viral infections.
`
`4
`
`0006
`
`

`
`Another difficulty is in deciding the granularity at which to emulate
`the program. What is the highest granularity at which emulation
`can be done to preserve the viral property of a program?
`Few emulators for virus detection are in use today. The VProt
`program by Fridrick Skulason for MS-DOS systems has this func(cid:173)
`tion as an option.
`
`2.3 Detection by static analysis/poHcy adherence
`
`This method examines a program to decide whether it meets a pre(cid:173)
`specified policy requirement, one which may include integrity re(cid:173)
`quirements for the detection of viruses. To determine whether an
`arbitrary program contains a virus is undecidable [4], but a con(cid:173)
`servative decision on the presence of viruses may be possible. For
`example, Maria King [ 14] describes a technique by which programs
`can be analyzed to determine whether they meet a prespecified pol(cid:173)
`icy.
`A policy is specified as a regular expression and defines the char(cid:173)
`acteristics within which the programs being tested must lie in order
`to fit the policy. The policy may be decided for the entire environ(cid:173)
`ment as a whole, may vary from machine to machine, or may be
`written for clusters of machines. Once a policy is written for an en(cid:173)
`vironment, programs running in that environment can be checked
`to see if their behavior fits the policy. This is a static process, not
`done by monitoring the executing program, but writing a minispec
`for the program and verifYing that the minispec fits the policy. A
`minispec for a program is also written as a regular expression thus
`making the verification straighforward. There exist well-known and
`efficient algorithms to determine whether a regular language is a
`subset of another regular language. The problem of verification is
`reduced to finding whether the regular language generated by the
`minispec is a subset of the regular language generated by the policy.
`A minispec of a program is a subset of the behavior of the program.
`This subset comprises the behavior of interest, such as the file rna(cid:173)
`nipulation properties of the program. The minispec is written from
`the source code, design documentation, programmers' notes and
`the test results of a program.
`The chief disadvantage of this approach is that the source of
`the program is required. The source, if shown to meet the policy
`
`5
`
`0007
`
`

`
`requirements, must then be compiled by a trusted compiler before
`being used or else the executable may not accurately reflect the
`source. Furthermore, this method involves considerable overhead
`to test and implement. It also requires that all monitored software
`have a definable minispec. We are unaware of any existing system
`that uses this mechanism.
`
`2.4 Detection by checksumming
`
`To protect programs against unwanted modification by viruses, a
`checksum based on the contents of the program is computed and
`stored in encrypted form within or outside the program. The en(cid:173)
`cryption is done using a one-way function so forgery of a correct
`checksum after infection is computationally very hard. Before each
`program is executed its checksum is recomputed, encrypted and
`matched with the stored result. If the two values are identical the
`chances of infection are very low: any mismatch implies an un(cid:173)
`wanted modification.
`One problem with this approach is that it requires system sup(cid:173)
`port so the check and execution can be performed atomically. [15]
`presents an excellent survey of checksumming techniques for virus
`detection. However, the chief argument against checksumming is
`that it cannot detect viral infection in an already infected file. It
`can detect further infection but not any current infections prior to
`generating the first checksum. There is also the danger of infection
`from what Radai [15] calls an "ambiguity" virus, i.e. infections oc(cid:173)
`curring at the time of copying files or compilations. If an infection
`occurred just after closing the file being written, the new checksum
`will be different from the previous one, but there is no way to tell
`whether infection has occurred.
`Checksumming is also susceptible to the "backtrack" attack de(cid:173)
`scribed in [6]. In a backtrack attack on an executable protected by
`an encrypted checksum, the executable is disassembled, the virus
`incorporated into the source and then reassembled to produce a
`new valid cryptographic checksum. In general the infection can
`take place at any point prior to the generation of the checksum and
`one can get to this point from either direction i.e. moving from the
`source or the object module.
`
`6
`
`0008
`
`

`
`2.5 Dynamic runtime program integrity check
`
`The motivation behind this approach is to ensure the integrity of
`an executable program while it is running or to detect infections
`between a program's integrity check and execution. The basic idea
`behind this approach is to precompute an encrypted checksum for
`a predefined "granule" of the program. For example, precompute
`an encrypted checksum for each basic block in the program and
`store it with the basic block. Every time control flows to the top
`of the basic block, recompute and match the encrypted checksum
`with the stored checksum to verify integrity of the instructions in
`the basic block. Refer to [6] for more information on this approach.
`This scheme requires hardware support to run efficiently. Fur(cid:173)
`thermore, unless trusted read and checksum operations are avail(cid:173)
`able, this method will not work against stealth viruses. It also does
`not work well in environments where program files modify them(cid:173)
`selves to save configuration information, as is the case in most PC
`operating systems. Again, we know of no system using this method
`for protection.
`
`2.6 Detection by time stamp modification
`
`This scheme, outlined in [17], is very similar to detection by check(cid:173)
`summing in that the time of last modification of the program serves
`the purpose of a checksum. The time of last modification of a pro(cid:173)
`gram is usually kept external to the environment storing the pro(cid:173)
`gram. At regular intervals the timestamps are matched to ascertain
`integrity of the program. There should not be any means for the
`virus to get to this file and modify it to destroy evidence of its activ(cid:173)
`ity. Another requirement of this scheme is that the time stamping
`operation must be irreversible by any process in the system. For
`example, the modification time of a UNIX inode cannot serve as a
`timestamp mechanism of this type, because the system clock may
`be set backwards, and because inodes may be written by access to
`the raw disk. [9]
`
`7
`
`0009
`
`

`
`2. 7 Detection by signature scanning
`Using the "signature" of a virus to detect its presence in an exe(cid:173)
`cutable is the simplest and most common approach to known virus
`detection. Once a virus is isolated, a sequence of bytes (unique se(cid:173)
`quence) from its code is taken as the identifying string for that virus.
`Programs (files) are checked for the presence of this signature and
`flagged as being infected if they contain the sequence.
`Signature extraction, however, is a difficult and time-consuming
`process because it involves disassembling and debugging the infec(cid:173)
`tion to identify key portions of the virus. These portions are then
`combined to form the signature. It is then necessary to test the
`signature against a large library of programs to reduce the likeli(cid:173)
`hood of false positives occuring when the signature coincidentally
`matches some production code. Signature scanning is based on
`the assumption that the virus does not alter itself arbitrarily before
`infecting another executable. For most viruses prevalent today,
`regular expressions are sufficiently powerful to capture any such
`changes.
`Simple signatures are usually specified as a string of characters
`in ASCII: A923BF7, for example. Signatures of this form that con(cid:173)
`tain fixed patterns of hex nibbles are simple to use with efficient
`string matching algorithms. Unfortunately, for some viruses, these
`fixed patterns are not sufficiently powerful to define the signature in
`a compact way. For such viruses one would have to specify a large
`number of fixed patterns to identify a single virus. For example,
`consider a virus whose code looks like
`
`insn
`jump PC+l
`arbitrary data
`insn
`jump PC+l
`arbitrary data
`
`(A)
`
`(B)
`
`and the virus modifies the data at locations A and B before copying
`itself into another executable. For such a virus, a fixed sequence
`of hex pattern digits cannot be specified in the signature if the in(cid:173)
`structions around the arbitrarily modified data were needed in the
`
`8
`
`0010
`
`

`
`signature to uniquely identify the virus. For such viruses we may
`specify the character ?
`to signify any value for that position, i.e.
`equal to the regular expression [0-9A-F] in the signature.
`More complicated pattems can be specified to ignore (up to) a
`specified number of characters or an arbitraiy number of values. In
`general, the specification may be of the form { n, m}, which means
`skip at least n characters (nibbles) and at most m values. A specifi(cid:173)
`cation { n} can signify an arbitrary number of nibbles 2: n.
`The chief disadvantage of scanning as a means of virus detec(cid:173)
`tion is its inability to detect unknown or new viruses. Scanning also
`fails with self-encrypted viruses. It also fails with executables com(cid:173)
`pressed with different compression techniques, making the same
`virus appear different. Finally, as more viruses are discovered,
`scanning algorithms tend to run more slowly because they must
`try to match a larger set of possibilities. Also, the more signa(cid:173)
`tures there are, the more likely that any arbitrary signature will
`also match some legitimate code in some application.
`A further benefit to scanning is that it can also be used against
`embedded trojan horse code, logic bombs, and other malicious soft(cid:173)
`ware in addition to simple viruses. 4 All that is needed to detect
`these pieces of code are appropriate signatures generated from a
`disassembly. These signatures can be added to the search set and
`used without any further change to the scanning software.
`Cohen argues in [5] signature scanning is not worth pursuing
`against computer viruses. He (correctly) observes that scanning
`cannot find new viruses before their pattems are known, nor will
`such methods work against self-encrypting viruses. He attempts to
`show that an integrity shell (i.e., checksumming) is the most cost(cid:173)
`effective approach to virus protection. The argument in [5] is based
`on some questionable data and assumptions, however, and is not at
`all convincing. We believe that the cost-benefit ratio for scanners,
`either by themselves or in addition to other mechanisms, is much
`higher than he calculates. This is because of scanners' low impact
`on existing practice and because of their flexibility. We believe that
`their widespread use and continued effectiveness in the commercial
`world affirm this view. Almost all currently available commercial
`anti-virus tools use signature scanning as their primary detection
`
`4See [7], [8], or [10) for definitions ofthese terms.
`
`9
`
`0011
`
`

`
`method.
`
`2.8 Summary of detection methods
`
`Among all the methods of virus detection mentioned above, the
`simplest and most economical for detecting the majority of cur(cid:173)
`rent viruses is signature scanning. While signature scanning may
`not be able to detect all possible viruses, it is still simple and cheap
`enough to be easily available and useful to the public at large, and
`it has the least impact on existing code and hardware. Moreover, it
`is simple to add new patterns to an existing scanner whenever new
`viruses are discovered. If stealth techniques are thwarted, scanning
`will work against the majority of common viruses prevalent today. [3]
`It is for these reasons that we decided to implement a generic sig(cid:173)
`nature scanner as our first anti -virus tool.
`What follows is a brief description of the c++ classes used to
`implement our scanner, the interface presented to the user, some
`measurement results relating to the time and memory utilization of
`running the scanner on a diskette containing a distribution of file
`sizes and seeded infections.
`
`3 Description of the tool
`
`We present a brief overview of our scanning tool and give a descrip(cid:173)
`tion of the main c++ classes from which the program is structured.
`c++ , being an object oriented language, permits data encapsula(cid:173)
`tion and abstraction, and we have tried to take full advantage of
`these features in partitioning our data and functions. As a result,
`we expect that future additions and modifications to this program
`will not change the overall organization of data drastically. We also
`chose to code in c++ because it will allow us to easily retarget the
`scanner to different filetypes.
`Our program has been written using AT&T c++ 2.1 on a Sun
`SPARC running SunOS 4.1.1. There are no immediate plans to port
`it to the GNU G++ version of the c++ language. Such a port should
`not be difficult, however, as no external libraries have been used,
`and we have used GNU-compatible features of the AT&T translator.
`
`10
`
`0012
`
`

`
`3.1 A brief description of the classes used in the scanner
`Perhaps the most significant class in the scanner is the abstract
`class Virinfo that encapsulates information common to all types
`of viruses:
`
`class Virinfo
`
`protected:
`typedef Virinfo *Virinfop;
`String _name; //name of the virus
`Table _aliases; //a table of aliases
`String _sig;
`//signature, stored in hex digits
`//like 07AFB235
`int _infects;
`//interpreted in the derived class
`
`public:
`String& name() { return name; }
`Table& aliases() { return aliases;
`String sig() { return _sig; }
`int& infects() { return _infects; }
`void set_name(char *n) { name = n;
`void add_alias(char *n)
`
`String *t =new String(n);
`_aliases.push(t);
`
`void set_sig(char *n) {_sig
`
`n; }
`
`virtual void print() = 0;
`virtual int infects_ftype(String& fname)
`0;
`virtual -virinfo() { }
`virtual void read(istream&) = 0;
`//read virus info from input stream
`
`} ;
`
`All storage members of Virinfo are protected to allow derived
`classes access to them. Several public functions, for example name () ,
`aliases () , sig () , infects () ... are defined in the abstract class.
`These procedures are generic and the same for all derivations of
`
`11
`
`0013
`
`

`
`Virinfo. All the undefined functions, except for the destructor,
`are pure virtual. The input extractor function>> is not a member
`function and is defined to make a call to the pure virtual function
`read () that must be defined appropriately in derived classes of
`viruses. The function read () therefore encapsulate changes in the
`input format of virus information. An example of the format we are
`currently using is:
`Type: DOS
`Name:
`432
`50CBBCC88EDBE80600EBD900E9040106
`Signature:
`Infects:
`COM
`Aliases:
`
`The Type: field indicates the type of virus record. For the proto(cid:173)
`type we only have a DOS derived class ofvirinfo, but it is not diffi(cid:173)
`cult to add newer types in the future. Example extensions would be
`for Amiga, Macintosh, and Atari file types, and for Unix executables
`(if a non-experimental Unix virus ever appears).
`Note that we can define our pattem alphabet and regular ex(cid:173)
`pression format over almost any alphabet we care to specify -
`it
`does not matter for the design of the program. We have chosen to
`define ours over hex digits (nibbles) because it is convenient, and
`because we do not need to compensate for cross-architecture byte
`order difficulties, as would be the case ifwe defined over 16 or 32
`bit quantities. The pattems provided by others that we used in our
`testing were defined in terms of hex digits, too.
`The scanner starts by reading information about viruses from a
`signature file containing entries of the type shown above. Depend(cid:173)
`ing on the value of the type field, objects representing these virus
`types (subclasses of Vir Info) are created to represent each virus
`record.
`Once objects representing all the virus information in the input
`file are created, a sparse trie is built to represent it. Each node of
`the trie is an instance of class node (not described in this paper).
`To each node of this tree is attached a list of viruses (a dynami(cid:173)
`cally resized table with elements of type Virinfo *)that match the
`signature obtained by following the path from the root of the tree
`(global variable TreeRoot) to this node. Each node in the tree has
`a fan out of;::: 16, corresponding to each hex character [0-9A-F] in(cid:173)
`terpreted as a number plus any regular expressions that can match
`
`12
`
`0014
`
`

`
`at this node. For each signature (regular expression), we follow this
`tree edges from the root (TreeRoot), making transitions on each
`"logical" character (%n, **, *n etc. count as one logical character)
`of the signature. We create new nodes in the tree on encountering
`leaf nodes, such that the path from the root of the tree to any given
`node marks the signature ofthevirus pattern attached at that node.
`Usually, leaf nodes specify signatures but if a signature is a prefix
`of another, then the intermediate nodes can also specify signatures.
`The algorithm, for inserting a signature in this globally accessible
`tree, in pseudo code is:
`
`node *t = root of the tree;
`
`for(every character in the virus signature)
`{
`
`if(the character is a hexadecimal digit)
`{
`
`make a child of t
`corresponding to this fixed nibble;
`t = this child;
`
`else if(the character is ?)
`
`labeled ?;
`
`create a child of t
`t = this child;
`I* there may be several marked ?
`if two signatures have a common
`prefix including a ?
`
`*I
`
`else if(the next two characters are
`of the form %d)
`
`create a child of t
`t = this child;
`/* there may be several
`marked % similarly */
`
`labeled %;
`
`else if(the next two characters are
`of the form *d)
`
`create a child of t labeled *;
`t = this child;
`
`13
`
`0015
`
`

`
`/* there may be several
`marked * similarly */
`
`else if(the next two characters are
`of the form **)
`
`create a child of t labeled
`t = this child;
`
`**• ,
`
`add this regular expression at node t;
`
`For example, for the following virus entries
`
`Type: DOS
`Name: A
`012
`Signature:
`Infects: COM SYS
`
`Type: DOS
`Name: B
`Signature: OlA
`Infects: COM ARC
`
`the tree looks like in figure 1.
`
`There is also an ifNibbleStream class that accesses a file in
`nibbles (characters) rather than bytes. This class can be used to
`encapsulate the file 1/0 interface from the program. Our definition
`is based on the if stream class available with the AT&T c++ distri(cid:173)
`bution on UNIX. This class can save the current nibble position in
`the file stream and restore it.
`The class Directory recurses through a given file system (usu(cid:173)
`ally, directory tree) and generates the full pathnames of all the files
`rooted at the tree. Names of files are returned by successive calls to
`its member function next () . This class encapsulates the file sys(cid:173)
`tem interface and directory structure visible to the program. These
`names are then used to open a "NibbleStream" so as to scan the
`contents of the file.
`
`14
`
`0016
`
`

`
`(1)
`
`(2)
`
`(3)
`
`(4)
`
`(5)
`
`point to a liat of viru110111
`:Ld.entified at this node
`
`A
`012
`
`B
`OlA
`
`F
`
`Figure 1: The node tree
`
`Thus, whether the filenames are /homes/kumar/foo or \homes\kumar\foo
`is completely transparent to the program and is completely encap-
`sulated within the classes Directory and ifNibbleStream. The
`scanner just requires a sequence of nibbles to do its matching, it
`is unaware of the file name syntax or the mechanism of opening
`and closing files. Not only does this allow complete independence
`from the underlying file system, it allows the scanner to work on
`encrypted, archived, and compressed files, so long as they can be
`returned as a stream of 4-bit nibbles and can be recognized during
`the filesystem scan.
`
`3.2 Why is our scanner generic?
`Our scanner is geneTic in the sense of and to the extent afforded by
`an object oriented programming language. We mean that there are
`no hardcoded dependencies that make it impossible to extend, and
`that as a principle, the same routines can apply to newer types of
`viruses, file systems, file formats etc. Whatever modifications may
`be required to the code are minimal and quite structured.
`We believe most of these features are well-supported with our
`choice of the programming paradigm more than would be afford
`by simply employing a disciplined programming style in a more
`
`15
`
`0017
`
`

`
`conventional language. Consider, for example, how we might add a
`new virus type joo in the scanner. We would derive a subclass joo of
`the abstract class Vir Info and define all the pure virtual functions
`in the abstract class. The routine that reads the file containing the
`signatures switches on the Type : information of the virus to create
`objects of that type. This routine, of necessity, must be modified
`to add another type so it can create objects of type joo when it
`encounters a description of a virus of type joo. For the most part
`this is all that needs to be done. All the modification to the code
`is encapsulated inside the pure virtual functions and defining them
`for joo is all that is required. The pattem matching function will
`continue to work for foo.
`Consider now, that the files to be scanned reside in a file system
`which is quite like the UNIX file system except that the component
`separator in a file name is \ instead of I. All the changes for this
`case are encapsulated in the class Directory and the rest of the
`program can probably run unmodified. Where genericity extends
`encapsulation is in permitting a derived class of Directory to be
`used wherever the class Directory was used, thus obviating the
`direct alteration of the class Directory.
`
`3.3 A brief description of the pattern matching algorithm
`The following regular expressions are supported in the scanner,
`the specification is as in TBSCAN. This algorithm is similar to the
`Aho-Corasick algorithm [1], but it has been extended for wildcard
`characters.
`
`? match any nibble in the input stream
`skip 0-n nibbles in the input stream
`%n
`*n skip exactly n nibbles
`* * skip an arbitrary number, including 0
`The algorithm considers every nibble position in the input stream
`as a possible beginning of a virus sequence. For each such position,
`it systematically maintains the set of possible signatures that match
`as a prefix the input stream nibbles from that fixed position to the
`current position. Matching stops when the nibble pattern from the
`fixed position to the current position match any virus signature
`
`16
`
`0018
`
`

`
`entirely. Backtracking can occur if signatures contain regular ex(cid:173)
`pression patterns ** & %n. Backtracking is currently implemented
`in a straightforward manner using recursion: failure pointers can
`be calculated, as explained in [1]. The current algorithm in pseu(cid:173)
`docode looks as follows:
`
`for(each nibble in the input stream)
`if(traverse(inputNibbleStream, TreeRoot))
`{
`
`I* virus detected *I;
`
`node *traverse(ifNibbleStream& i, node& n)
`
`if(there are virus signatures
`a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket