`
`DYNAMIC DETECTION AND CLASSIFICATION OF
`COMPUTER VIRUSES USING GENERAL BEHAVIOUR
`PATTERNS
`
`Morton Swimmer
`
`Virus Test Center, University of Hamburg, Odenwaldstr. 9, 20255 Hamburg, Germany
`Tel +49 404 910041 · Fax +49 405 471 5226 ·Email swimmer@acm.org
`
`Baudouin Le Charlier and Abdela:!iz Mounji
`
`F.U.N.D.P., Institutd' lnformatique, University ofNamur, Belgium
`Email ble@info.fundp.ac.be / amo@info.fundp.ac.be
`
`ABSTRACT
`
`The number of files which need processing by virus labs is growing exponentially. Even though only a small
`proportion of these .files will contain a new virus, each file requires examination. The normal method for
`dealing with .files is still brute force manual analysis. A virus expert runs several tests on a given file and
`delivers a verdict on whether it is virulent or not. If it is a new virus, it will be necessGiy to detect it. Some
`tools have been developed to speed up this process, ranging/rom programs which identify previously(cid:173)
`classified files to programs that generate detection data. Some anti-virus products have built-in mechanisms
`based on heuristics, which enable them to detect unknown viruses. Unfortunately all these tools have
`limitations.
`
`In this paper, we will demonstrate how an emulator is used to monitor the system activity of a virtual PC,
`and how the expert system ASAX is used to analyse the stream of data whicg the emulator produces. We use
`general rules to detect real vin1ses generically and reliably, and specific rules to extract details of their
`behaviour. The resulting system is called VI DES: it is a prototype for an automatic analysis system for
`computer viruses and possibly a prototype anti-virus product for the emerging 32 bit PC operating
`systems.
`
`1
`
`INTRODUCTION
`
`Virus researchers must cope with many thousands of suspected files each month, but the problem is not so
`much the number of new viruses (which number perhaps a few hundred and grows at a nearly exponential
`rate) as the number of files the researcher receives and must analyse - the glut. Out of perhaps one hundred
`files, only one may actually contain a new virus. Unfortunately, there are no short cuts. Every file has to be
`processed.
`
`ViRUS BULLETIN CONFERENC£©1995 Virus Bulletin Ltd, 21 The Quadrant, Abingdon, Oxfordshire, OX143YS, England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a retrieval system, or transmined in any form
`without the prior wrinen permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`7 6 • SWIMMER: DYNAMIC DETEGION AND CLASS/FICA TION OF COMPUTER VIRUSES ...
`
`The standard method of sorting out such files is still brute force manual analysis, requiring specialists.
`Some tools have been developed to help cope with the problem, ranging from programs which identify and
`remove previously-classified files and viruses to utilities which extract strings from infected files that aid in
`identifying the viruses. However, none of the solutions are satisfactory. Clearly, more advanced tools are
`needed.
`
`In this paper, the concept of dynamic analysis as applied to viruses is discussed. This is based on an idea
`called VIDES (Virus Intrusion Detection Expert System), coined at the Virus Test Center (BFHS9 1]. The
`system will comprise of a PC emulation and an IDES-like expert system. It should be capable of detecting
`viral behaviour using a set of a priori rules, as shown in the preliminary work done with Dr. Fischer(cid:173)
`Hiibner. Furthermore, advanced rules will help in classifying the detected virus.
`
`The present version of VIDES is only of interest to virus researchers; it is not designed to be a practical
`system for the end-user - its demands on processing power and hardware platform are too high. However, it
`can be used to identify unknown viruses rapidly and provide detection and classification information to the
`researcher. It also serves as a prototype for the future application of intrusion detection technology in
`detecting malicious software under future operating systems, such as OS/2, MS-Windows NT and 95,
`Linux, Solaris, etc.
`
`The rest of the paper is organized as follows: Section 2 presents the current state of the art in anti-virus
`technology; Section 3 describes a generic virus detection rule; Section 4 discusses the architecture of the PC
`auditing system; Section 5 shows how the expert system ASAX is used to analyse the activity data collected
`by the PC emulator; and fmally, Section 6 contains some concluding remarks.
`
`2
`
`CURRENT STATE OF THE ART
`
`For the purpose of discussion it will be necessary to define the term computer virus.
`
`2.1 TERMS
`There is still no universally-agreed definition for a computer virus. What is missing is a description which
`is still general enough to account for all possible implementations of computer viruses. An attempt was
`made in [Swi95], which is the resultof many years of experience with viruses in the Virus Test Center. The
`fo llowing definition for a computer virus is the result of discussion in comp.virus (Virus-L) derived from
`[Seb]:
`
`Def 1
`
`A Computer Virus is a routine or a program that can 'infect' other programs by modifying them
`or their environment such that a call to an injected program implies a call to a possibly evolved,
`functional~y similar, copy of the virus.
`
`A more formal, but less useful, definition ofacomputer viruscan be found in [Coh85]. Using the formal
`definition, it was possible to prove the virus property undecidable.
`
`We talk of the infected file as the host p rogram. System viruses infect system programs, such as the boot
`or Master Boot Sector, whereas file viruses infect executable files such as EXE or COM files. For an in(cid:173)
`depth discussion of the properties of viruses, please refer to literature such as: [Hru92], [SK94], [Coh94] or
`[Fer92].
`
`Today, anti-virus technology can be divided into two approaches: the virus specific and the generic
`approach. In principle, the former requires knowledge of the viruses before they can be detected. Due to
`advances in technology, this prerequisite is no longer entirely valid in many of the modem anti-virus
`products. This type of technology is known to us as a scanner. The latter attempts to detect a virus by
`observing attributes characteristic of all viruses. For instance, integrity checkers detect viruses by checking
`for modifications in executable files; a characteristic of many (although not all) viruses.
`
`VIRUSBUUETINCONFERENCE©1995VirusBulletinLtd,21 TheQuadrant, Abingdon,Oxfordshire,OX143YS,England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`without the prior written permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`VIRUS BUUETIN CONFERENCE, SEPTEMBER /995 • 77
`
`2.2 VIRUS SPECIFIC DETECTION
`
`Virus specific detection is by far the most popular type of virus protection used on PCs. Information
`from the virus analysis is used in the so-called scanner to detect it. Usually, a scanner uses a database of
`virus identification information which enable it to detect all viruses previously analysed.
`
`The term scanner has become increasingly incorrect terminology. The term comes from lexical scanner, i.e.
`a pattern matching tool. Traditionally scanners have been just that. The information extracted from viruses
`were strings which were representative of that particular virus. This means that the string has to:
`
`• differ significantly from all other viruses, and
`
`• differ significantly from strings found in bona fide anti-virus programs.
`
`Finding such strings was the entire art of anti-virus program writing until polymorphic viruses appeared on
`the scene.
`
`Encrypted viruses were the first minor challenge to string searching methods. The body of the virus was
`encrypted in the host file , and could not be sought, due to its variable nature. However, the body was
`prepended by a decryptor-loader which must be in plain text (unencrypted code); otherwise it would not be
`executable. This decryptor can still be detected using strings, even if it becomes difficult to differentiate
`between viruses.
`
`Polymorphic viruses are the obvious next step in avoiding detection. Here, the decryptor is implemented
`in a variable manner, so that pattern matching becomes impossible or very difficult. Early polymorphic
`viruses ?~ere identified using a set of patterns (strings with variable elements). Moreover, simple virus
`detection techniques are made unreliable by the appearance of the so-called Mutation Engines such as
`MtE and TPE (Trident Polymorphic Engine). These are object library modules generating variable
`implementations of the virus decryptor. They can easily be linked with viruses to produce highly
`polymorphic infectors. Scanning techniques are further complicated by the fact that the resulting viruses
`do not have any scan strings in common even if their structure remains constant. When polymorphic
`technology improved, statistical analysis of the opcodes was used.
`
`Recently, the best of the scanners have shifted course from merely detecting viruses to attempting to
`identify the virus. This is often done with added strings, perhaps position dependent, or checksums, over the
`invariant part of the virus. To support this, many anti-virus products have implemented machine-code
`emulators so that the virus' own decryptor can be used to decrypt the virus. Using these enhancements, the
`positive identification of even polymorphic viruses poses no problem.
`
`The next shift many scanners are presently experiencing is away from known virus only detection to
`detection of unknown viruses. The method of choice is heuristics. Heuristics are built into an anti-virus
`product in an attempt to deduce whether a file is infected or not. This is most often done by looking for a
`pattern of certain code fragments that occur most often in viruses and hopefully not in bona fide programs.
`
`Heuristics analysis suffers from a moderate to high false-positive rate. Of course, a manufacturer of a
`heuristic scanner will improve the heuristics both to avoid false positives and still find all new viruses, but
`both cannot be achieved completely. Usually, a heuristic scanner will contain a 'traditional' pattern-matching
`component, so that viruses can be identified by name.
`
`2.3 GENERIC VIRUS DETECTION
`
`Computer viruses must replicate to be viruses. This means that a virus must be observable by its mechanism
`of replication.
`
`VIRUS BULLETIN CONFERENC£©1995 Virus Bulletin Ltd, 21 The Quadrant, Abingdon, Oxfordshire, OX143 YS, England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`without the prior written permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`
`
`VIRUS BULLETIN CONFERENCE, SEPTEMBER 1995 • 79
`
`In practice, we only represent those actions relevant to the infection scenario. As a result, many possible
`actions may occur between adjacent states, but are not recorded because they do not entail a modification in
`the current state. In terms of auditing, irrelevant audit records may be present in the sequence of audit
`records representing the infection signature.
`
`For the sake of simplicity, discussion of the generic detection rules are based on the state transition
`diagrams described above.
`
`3.2 BUILDING THE RULES
`
`VI DES uses three types of detection rules: generic detection rules, virus specific rules, other rules. As its
`name implies, generic rules are used to detect all viruses which use a known attack pattern. For this, models
`of virus behaviour are needed for the target system (in our case MS-DOS). Virus-specific rules use
`information from a previous analysis to detect that specific virus, or direct variants. These rules are similar
`to virus-specific detection programs, except for the fact that they analyze the dynamic behaviour of the virus
`instead of its code. Finally, there are the 'other rules' for gleaning other information from the virus which
`can be used in its classification.
`
`We will not go into the virus-specific rules or the ' other' rules, concentrating instead on the generic rules.
`
`In developing a generic rule for detecting viruses, we need to have a model for the virus attack. No one
`model will do, because MS-DOS viruses can use choose from many effective strategies. This is
`compounded by the diversity of executable file types forMS-DOS. Fortunately for us, the majority of
`viruses have chosen one particular strategy, and infect only two types of executable files. This means that
`we can detect most viruses with very few rules. On the other hand, a virus which uses an unknown attack
`strategy will not be detected. For this reason, the prototype analysis system contains an auxiliary static
`analysis component to detect such problems.
`
`In the following, we will develop a generic rule which detects file infectors that modify the file directly to
`gain control over that file. We will concentrate on COM file infectors. EXE file infectors are detected in an
`analogous way.
`
`We must make two assumptions about the behaviour of DOS viruses to help us build the rule.
`
`Assumption 1:
`
`A file-infecting virus modifies the host file in such a way that it gains control over the
`host file when the host file is run.
`
`This is a specific version of the virus definition (Def 1 ). However, it doesn't specify when the virus gains
`control over the host file.
`
`Assumption 2:
`The virus in an infected file receives control over the file before the original host
`program.
`
`That is, when the infected file is run, the virus is run before the host program.
`
`Discussion: If the virus never gains control over the host file, it would not fulfil the definition of a virus.
`This observation leads to Assumption 1. However, there is no reason (in the defmition) why the virus must
`gain control before the host does.
`
`We make an additional assumption that the virus does gain control before the host program does. The reason
`we do this is to avoid very blatant false positives. However, it should be noted that Assumption 2 does not
`result from the virus definition, and will cause some viruses to be missed. For these cases, other rules are
`used.
`
`VIRUS BULLETLNCONFERENC£©1995 Virus BulletinLtd, 21 The Quadrant, Abingdon, Oxfordshire, OX143YS, England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`without the prior written permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`
`
`VIRUS BULLETIN CONFERENCE, SEPTEMBER 1995 • 81
`
`events. In fig. 2, reopen is abstracted as a transition element, whereas its implementation is as a separate
`rule.
`
`MS-DOS provides two methods of accessing files. The most common method uses file handles. Access
`usingfile control blocks (FCB) was provided for compatibility to CP/M, and is rarely used, even by
`viruses. However, because it is used, we need a separate rule to handle this method. The basic rule stays the
`same, but internal handling of the data is different.
`
`We could avoid this problem by abstracting the audit data to give us a generic view of the system events.
`This way, we could reduce the number of audit records to only relevant higher-level records by using a
`filter. After that, processing becomes simpler as the problems of reopens and handle/FCB use disappear.
`This method also allows us to apply the rules on non-MS-DOS systems which provide similar file handling.
`
`As a matter of fact, ASAX itself is the logical choice to act as the filter. The first ASAX system reads the
`raw audittrail, converts it into generic data, and pipes its output as a NADF file for further processing (see
`Section 5). Using ASAX as a filter allows us to reduce the complexity of maintaining such a system while
`not sacrificing any power.
`
`4
`
`PC AUDITING
`
`The prerequisite for using an Intrusion Detection (ID) system like ASAX is an audit system which securely
`collects system activity data. In addition, integrity of the ID system itself must not be compromised: this
`means that the audit data retrieval, analysis and archiving must be secured against corruption by viruses.
`Moreov.er, the ID system must not be prevented from reporting (raising alarms, updating virus information
`databases) the results of such analysis. DOS neither provides such a service, nor makes the implementation
`of such a service easy. Its total lack of security mechanisms means that the collection of data can be
`subverted. Even if the collection can be secured, the data is open to manipulation if stored on the same
`machine.
`
`For the prototype ofVIDES, we were not bound to a real world implementation, so we explored various
`alternative possibilities. The experience gained by the use of such a system will not benefit DOS users, but
`should be applicable to users of various emerging 32-bit operating systems which offer DOS support.
`
`We have made several attempts to build a satisfactory audit system: these are described hereafter.
`
`4.1 DOS INTERRUPTS
`
`All DOS services are provided to application programs via interrupts, which can be described as indexed
`inter-segment calls. Primarily, interrupt Ox21 is used. The requested service is entered into the AH
`register and its parameters are entered into the other registers. When the service is finished, it returns
`control to the calling program and provides its results in registers or in buffers.
`
`The very first implementation of an auditing system was a filter which was placed before DOS Services and
`registered all calls to DOS functions. This was done very early on, together with Dr. Fischer-Hi.ibner, to
`prove the feasibility ofthe VIDES concept. It also demonstrated the limits which DOS imposes on the
`implementation of such an auditing system: it did not run reliably, and could be subverted by tunnelling
`viruses.
`
`This implementation was soon scrapped, but it did prove that the premise was correct: viruses could be
`found using ID technology. This was perhaps the first such a trial that had been done [BFHS91].
`
`VIRUSBULLETINCONFERENCE©l995VirusBulletinLtd,2l TheQuadrant,Abingdon, Oxfordshire, OXI43YS, England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a reoieval system, or transmined in any form
`without the prior wrinen permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`82 • SWIMMER: DYNAMIC DETEa/ON AND CLASS/FICA TION OF COMPUTER VIRUSES ...
`
`4.2 VIRTUAL 8086 MACHINE
`The Intel iAPX 3 86 introduced the so-called virtual 8086 machine mode. A protected mode operating system
`can create many virtual8086 machines in which tasks can run completely isolated from each other and from
`the operating system. Each task 'sees' only its own environment. Operating systems such OS/2 use these
`constructs to provide a full DOS environment for DOS programs. All calls to the machine (via the BIOS
`interface or direct port access) and DOS are redirected to the host operating system (OS/2 in this case) for
`processmg.
`
`This mechanism can also be used to monitor the activity in DOS session. Because all interrupts are being
`redirected to the native operating system, the native operating system can record the activity securely and
`unobtrusively.
`
`Care has to be taken in the implementation ofthe virtual8086 machine. The DOS windows in OS/2 have
`been shown in tests at the VTC to be too permissive. In the course of a comprehensive test including the
`entire collection of file viruses, many of the viruses running under a DOS window managed to harm vital
`parts of the system. One problem was that OS/2 files could be manipulated directly from within the DOS
`session. However, this did not explain the corruption of the running operating system.
`
`Even though using a virtual8086 machine was the original method of choice, such experiments showed that
`the complexity of building a safe implementation would be difficult. A more secure method was sought for
`the prototype.
`
`4.3 HARDWARE SUPPORT
`Hardware debugging systems, such as the Periscope IV, may be used to monitor system events closely in
`real time. This is achieved by a card fitted between the CPU and the motherboard and which can set break
`points on various types of events on the PC' s bus. The card is connected to a receiving card in a second PC
`which is used to control the debugging session.
`
`Monitoring system behaviour on a DOS machine can be accomplished by capturing the Interrupt Ox21
`directly, or by setting a break point in the resident DOS kernel. Special memory areas can be monitored by
`setting a break condition on access to those areas.
`
`The monitoring is completely unobtrusive, i.e. the program will not notice a difference between running
`with or without the debugger. When an event is triggered, the PC is stopped while the controlling PC is
`processing the data. If the controlling PC is fast enough, the time delay should be nearly negligible.
`
`A hardware solution using the Periscope IV is complicated by the problem of automating the processes
`necessary to test large numbers of viruses on different operating systems. When such a solution is
`implemented, it will offer the possibility oftesting viruses on other PC operating systems which require full
`iAPX 386 compatibility.
`
`8086 EMULATION
`4.4
`The solution which was finally chosen was the software emulation of the 8086 processor. An emulation is a
`program which accepts the entire instruction set of a processor as input, and interprets the binary code as the
`original processor would. All other elements of the machine must be implemented or emulated, e.g. the
`various ports. To simplify and quicken the emulation, the BIOS Code (Basic Input Output System- the
`interface between the operating system and the hardware) can be replaced with special emulation hooks, so
`that the complicated machine access can be skipped as long as all access to those services are routed via the
`BIOS. In the case of a graphics adapter, the entire hardware must be emulated, whereas disk access can be
`handled with hooks in the BIOS.
`
`VIRUS BULLETIN CONFERENC£©1995 Vrrus Bulletin Ltd, 21 The Quadrant, Abingdon, Oxford shire, OX14 3YS, England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`without the prior written permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`VIRUS BUUETIN CONFERENCE, SEPTEMBER /995 • 83
`
`Using emulation gives us all the advantages of the hardware solution plus the possibility of handling
`everything in pseudo real-time with respect to the program running in the emulation. Because even the
`time-giving functions of the emulation are being steered by the emulation, when interruptable to process an
`event, the time in the emulation can also be stopped.
`
`The emulation is 'safe' as the running virus has no access to the host machine at all. This is because the
`target machine's memory is being controlled entirely by the emulation, and file accesses are directed to a
`virtual disk, stored as a disk image file.
`
`The major problem with using an emulation is its lack of speed. Even on fast platforms, the running speed
`is only marginally faster than an original PCIXT.
`
`4.5 ACTIVITY DATA FORMAT
`Audit records representing the program behaviour in general, and virus activity in particular, have a pattern
`which is borrowed from the Dorothy Denning's model oflntrusion Detection [Den87] (<Subject, Action.
`Object, Exception-Condition, Resource-Usage, Time-Stamp>). However, due to the way processes are
`handled in DOS, this pattern is slightly modified to collect useful available attributes. For instance, the code
`segment of a process is chosen instead of the common process identifier in most existing multi-user
`operating systems.
`
`The audit record attributes of records as collected by the PC emulator have the following meaning: code
`segment is the address in memory of the executable image of the program;function number is the number
`ofthe DOS function requested by the program; arg (..) is a list of register/memory values used in the
`call to a· DOS function; ret(..) is a list of register/memory values as returned by the function call;
`RecType is the type of the record; StartTime and EndTime are the time stamp of action start and end
`respectively. The final format for an MS-DOS audit record is as follows: <code segment, RecType.
`Stai1Time, EndTime, function number, arg (..),ret (..)>. An example of an audit trail is given in fig. 3.
`
`<CS=3911 Type=O Fn =30 arg() ret ( AX=5)>
`<CS=3911 Type=O Fn=2 9 arg () ret ( BX=128 ES=391ll>
`<CS=3911 Type=O Fn=64 arg ( AL=61 CL=3 strl=* . COM) ret ( AL=O CF =O)>
`<CS=3911 Type =O Fn=51 arg ( AL=O strl=COMMAND . COM) ret ( AL =O CX=32 CF=O)>
`<CS=3911 Type=O Fn=51 arg( AL=l strl=CO~~ND.COM ) ret ( AL=O CX=32 CF=O)>
`<CS=3911 Type=O Fn=4 5 arg( AL=2 CL= 32 strl=COMMAND . COM) ret ( AL=O AX=5 CF= 0)>
`<CS =3911 Type =O Fn= 73 arg( BX=5 ) ret( CX=10241 DX= 6206 CF =O)>
`<CS=3911 Type =O Fn =27 arg () ret ( CX=5121 DX=8032)>
`<CS =3911 Type=O Fn=47 arg( BX =5 CX=3 DX=828 DS=3911 ) ret ( AX=3 CF=O)>
`<CS=3911 Type= O Fn=5 0 arg( P..L=2 BX= 5 CX=O DX=O) ret ( AL= O AX= 50031 DX= CF= O)>
`<CS=3911 Type=O Fn=48 arg( BX=5 CX=64 8 DX=313 DS =39l1) ret( AX=648 CF =O)>
`<CS=3911 Type =O Fn= 50 arg ( AL=O BX= 5 CX=O DX=O) ret ( AL=O AX=O DX=O CF= O)>
`<CS=3911 Type= O Fn=4 8 arg( BX=5 CX=3 DX=8 31 DS=3911 ) ret ( AX=3 CF=O)>
`<CS=3911 Type =O Fn=7 4 arg ( BX=5 CX=10271 DX=6206) ret ( CF=O)>
`<CS=3911 Type =O Fn=46 arg( BX=5 ) ret ( CF=0)>
`<CS =3911 Type= O Fn=51 arg( AL=l strl=CO~~ND .C OM) ret ( AL= O CX=32 CF=0)>
`
`Figure 3: Exce1ptjrom an audit trail for the Vienna virus
`
`4.6 ACTIVITY DATA COLLECTION
`The audit system was integrated into an existing PC emulation by placing hooks into the module for
`processing all opcodes corresponding with the events (see fig. 4). These are primarily calls to the DOS
`functions. This was implemented in such a way, that stealth and tunnelling viruses could not circumvent the
`
`Vl R USB ULLETJN CONFERENC£©1995 VIrus Bulletin Ltd, 21 The Quadrant, A bing don, Oxford shire, OX 143 YS, England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`without the prior written permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`84 • SWIMMER: DYNAMIC DETEa/ON AND CLASS/FICA TION OF COMPUTER VIRUSES ...
`
`mechanism. A separate module receives notification of the event and pushes all parameters on to a stack.
`When the DOS call returns, the parameters are popped from the stack and sent to the audit trail with the
`return values.
`
`cpu
`
`audit
`
`hardware
`
`bios
`
`I
`
`vga J
`
`vgahard
`
`xstuff
`
`!
`
`X11
`
`UNIX
`
`(mfs)
`
`t
`I
`I
`
`• ...
`
`~
`
`~
`
`J
`
`Figure 4: Modules in Pandora
`
`Internally, the audit trail complies to a canonical format, which is also ASAX's native format. This is very
`generic, and allows most types of records to be implemented.
`
`An example of an audit trail is printed in Figure 3. This is a human readable representation of the binary
`NADF file.- The example is from an audit trail of the Vienna virus. The text representation does not
`comply exactly with the binary version. Some of the less important fields are missing so that the audit
`record becomes clearer and shorter.
`
`In the next section, we show how the activity data produced by the emulator is analysed using ASAX.
`
`4.7 USING RUSSEL TO DETECT INFECTION SCENARIOS
`In this section, we show how the RUSSEL language can be used effectively to detect an infection scenario.
`We first model the infection as a state transition diagram, then briefly show how this diagram can be
`translated into RUSSEL rules.
`
`Each state in the diagram is represented by a rule describing not only the current state, but also the sequence
`of previous states leading to it. The actual parameters of the current rule encode all the relevant information
`collected in previously-visited states. A transition in the diagram is represented by the rule-triggering
`mechanism of the RUSSEL language as described in section 5. The actual parameters of the current rule are
`computed from the data items conveyed by the current audit record and from the parameters of the current
`rule. Once triggered, the new rule represents the new current state in the transition diagram.
`
`In particular, the very first active rule at the beginning of the detection process has no actual parameters,
`since no information is contained in the initial state (one can argue that the initial state contains this
`assertion: system is clean. That is then represented by an empty list of parameters). As an example, the
`states s0, s1 and s2 of fig. 5. are represented by the rules Open, readBOF( ... ), and l seekEOF( ... ) respectively.
`Figure 4. 7 depicts this set of rules in the RUSSEL language. In this figure, RUSSEL keywords are noted in
`bold-face characters, words in italic style identify fields in the current audit record, and actual parameters are
`noted in roman-style words.
`
`V/RUSBULLETJNCONFERENCE©l995VuusBulletinLtd,21 TheQuadrant, Abingdon, Oxfordshire,OX143YS,England.
`Tel. +44 (0)1235 555139. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form
`without the prior written permission of the publishers.
`
`BLUE COAT SYSTEMS - Exhibit 1005
`
`
`
`
`
`86 • SWIMMER: DYNAMIC DETECTION AND ClASSIFICATION OF COMPUTER VIRUSES ...
`
`audit trail analysis version is applicable only to a single audit traiL The other version allows a distributed
`analysis of multiple audit trails produced at various machines on a network. In the latter version, ASAX
`filters audit data at each monitored node and analyses the filtered data gathered at a central host (see
`[M LCHZ95]). In the following, we describe briefly the main features of ASAX.
`
`5.1 UNIVERSALITY
`
`ASAX is theoretically able to analyse arbitrary sequential files. No semantic restrictions are imposed on the
`file being analysed. For instance, analysed files could be trace data, generated by a process controller, or
`audit data, collected in a multi-user environment. In the context of this paper, the sequential file is the
`activity data record produced by the PC emulator. The universality is attained by translating native files to a
`generic format which is the only one supported by the evaluator. The format is simple and flexible enough to
`allow straightforward conversion of most file formats. This generic format is referred to as the Normalized
`Audit Data Format (NADF).
`
`An NADF file is a sequential file of records in NADF format. An NADF record consists of the following:
`
`• a four-byte integer representing the length (in bytes) of the whole NADF record (including the length
`field);
`• a certain number of contiguous audit data fields. Each audit data field contains the three following
`contiguous items:
`identifier:
`
`an unsigned short (16-byte) integer which is the identifierofthe audit data.
`This item must be aligned on a 2-bytes boundaries;
`an unsigned short integer which is the length of the audit data value;
`
`length:
`value:
`
`the actual audit data value.
`
`In addition, audit data identities appearing in an NADF record must be sorted in a strict ascending order.
`This is important for ASAX to preprocess audit records efficiently before analysis. A user guide for
`constructing NADF files is presented is [Mou95].
`
`5.2
`POWER: THE RUSSEL LANGUAGE
`RUSSEL (RUle-baSed Sequence Evaluation Language) is a novel language, specifically tailored to the
`problem of searching arbitrary patterns of records in sequential files. The built-in mechanism of rule
`triggering allows a single pass analysis of the sequential file from left to right.
`
`The language provides common control structures such as conditional, repetitive, and compound actions.
`Primitive actions include assignment, external routine call and rule triggering. A RUSSEL program
`simply consists of a set of rule declarations which are made of a rule name, a list of formal parameters and
`local variables, and an action part. RUSSEL also supports modules sharing global variables and exported
`rule declarations. The operational semantics of RUSSEL can be briefly described as follows:
`
`• Records are analysed sequentially. The analysis of the current re