`
`5 r • \l •";'
`
`(
`v
`
`~-~~~-----~ -
`
`~
`
`Proceedings of the
`12th USENIX
`Security Symposium
`
`I ON LOAN FROM BLDS BOSTON SPA.
`
`FOR USE AT THE BRITISH LIBRARY
`ST. PANCRAS READING ROOM ONLY
`
`.
`
`L 2 L
`
`D55·2a I 01/11
`
`August 4-8, 2003
`Washington, D.C., USA
`
`Blue Coat Systems - Exhibit 1010 Page 1
`
`
`
`For additional copies of the se proceedings contact:
`
`USENIX Assoc iation
`2560 Ninth Street. Suite 215
`Berkeley, CA 9-+710 USA
`Phone: 510 52 8 8649
`FAX: 510 548 5738
`Ema il: office @usenix.org
`URL: http://www.u senix.org.
`
`Security XI
`Security X
`Security IX
`Security VIII
`Security VII
`Security VI
`Security V
`Security IV
`Security III
`Security II
`Sec-urity
`
`Past USENIX Security Proceedings
`Washington, D.C. , USA
`August 2002
`Washington, D.C. , USA
`August 2001
`Denver, Colorado, USA
`August 2000
`Washington , D.C., USA
`August 1999
`San Antonio , Texas, USA
`January 1998
`San Jose, California, USA
`July 1996
`Salt Lake City, Utah, USA
`June 1995
`Santa Clara, California, USA
`October 1993
`September 1992
`Baltimore, Maryland, USA
`Portland, Oregon, USA
`August 1990·
`Portland, Oregon, USA
`August 1988
`
`$30/38
`$30/38
`$27/35
`$27/35
`$27/35
`$27/35
`$27/35
`· $15/20
`$30/39
`$13116
`$7/7
`
`© 2003 by The USENIX Association
`All Rights Reserved
`
`This volume is published as a collective work. Rights to individual papers
`remain with the author or the author's employer. Permission is granted for the
`noncommercial reproduction of the complete work for educational or research
`purposes. USENIX acknowledges all trademarks herein.
`
`ISBN 1-931971-13-7
`
`Printed in the United States of America on 50% recycled paper, 10-15% post-consumer waste.
`
`Blue Coat Systems - Exhibit 1010 Page 2
`
`
`
`Static Analysis of Executables to Detect Malicious Patterns*
`
`Mihai Christodorescu
`mihai@cs . wisc .edu
`
`Somesh Jha·
`jha@cs.wisc.edu
`
`Computer Sciences Department
`University of Wisconsin, Madison
`
`Abstract
`Malicious code detection is a crucial component of any defense mechanism. In this paper, we present a unique view(cid:173)
`point on malicious code detection. We regard malicious code detection as an obfuscation-deobfuscation game between
`malicious code writers and researchers working on malicious code detection. Malicious code writers attempt to obfus(cid:173)
`cate the malicious code to subvert the malicious code detectors, such as anti-virus software . We tested the resilience of
`three commercial virus scanners against code-obfu scation attacks. The results were surprising: the three commercial
`virus scanqers could be subverted by very simple obfu scation transformations ! We present an architecture for detect(cid:173)
`ing malicious patterns in executables that is resilient to common obfuscation transformations . Experimental results
`demonstrate the effi cacy of our prototype tool , SAFE (a ~tatic Qnalyzer fo r yecutables) .
`
`1 Introduction
`
`In the interconnected world of computers, malicious
`code has become an omnipresent and dangerous threat.
`Malicious code can infiltrate hosts using a variety of
`methods such as attacks against known software fl aws,
`hidden functionality in regular programs, and social en(cid:173)
`gineering. Given the devastating effect malicious code
`has on our cyber infrastructure, identifying malicious
`programs is an important goal. Detecting the presence
`of malicious code on a given host is a crucial component
`of any defense mechanism.
`Malicious code is usually classified [30] according to
`its propagation method and goal into the following cate(cid:173)
`gories:
`• viruses are programs that self-replicate within a host
`by attaching themselves to programs and/or documents
`that become carriers of the malicious code;
`• worms self-replicate across a network;
`• trojan horses masquerade as useful programs, but con(cid:173)
`tain malicious code to attack the system or leak data;
`• back doors open the system to external entities by sub-
`
`*This work was supported in part by the Office of Naval Research
`under contracts N00014 -0 1-1-0796 and N000 14-0l-1·0708. The .U.S.
`Government is authorized to reproduce and di stribute reprints for Gov.
`ernmental purposes, notwithstanding any copyright notices affi xed
`thereon.
`The views and conclusions contained herei n are th ose of the authors.
`and sho uld not be interpreted as necessarily represe nting the official
`policies or endorse ments, either expressed or implied, of the above
`government agencies or the U.S. Government.
`
`verting the local security policies to allow remote access
`and controf over a network;
`• spyware is a useful software package that also trans(cid:173)
`mits private user data to an external entity.
`Combining two or more of these malicious code cate(cid:173)
`gories can lead to powerful attack tool s. For example, a
`worm car. contain a payload that installs a back door to
`allow remote access. When the worm replicates to a new
`system (via email or other means), the back door is in(cid:173)
`stalled on that system, thus providing an attacker with a
`quick and easy way to gain access to a large set of hosts.
`Staniford et al. have demonstrated that worms can prop(cid:173)
`agate extremely quickly through a network, and thus
`potentially cripple the entire cyber infrastructure [43] .
`In a recent outbreak, the Sapphire/SQL Slammer worm
`reached the peak infection rate in about 10 minutes since
`launch, doubling every 8.5 seconds [31]. Once the back(cid:173)
`door tool gains a large installed base, the attacker can
`use the compromised hosts to launch a coordinated at(cid:173)
`tack, such as a distributed denial-of-service (DDoS) at(cid:173)
`tack [5] .
`In this paper, we develop a methodology for detecting
`malicious patterns in executables. Although our meth od
`is general, we have initially focused our attention on
`viruses. A computer virus replicates itself by inserting
`a copy of its code (the viral code) into a host program.
`When a user executes the infected program, the virus
`copy runs, infects more programs, and then the original
`program continues to execute. To the casual user, there
`is no perceived difference between the clean and the in-
`
`USENIX Association
`
`12th USENIX Security Symposium
`
`169
`
`Blue Coat Systems - Exhibit 1010 Page 3
`
`
`
`fected copies of a program until the virus activates its
`malicious payload.
`The classic virus-detection techniques look for the
`presence of a virus-specific sequence of instmctions
`(called a virus signature) inside the program: if the sig(cid:173)
`nature is found, it is highly probable that the program is
`infected. For example, the Chernobyl/CIH virus is de(cid:173)
`tected by checking for the hexadecimal sequence [47]:
`
`ESOO
`OFOl
`
`0000
`4C24
`
`005B
`FE5B
`
`8D4B
`83C3
`
`4251
`lCFA
`
`5050
`8B2B
`
`This corresponds to the following IA-32 instruction se(cid:173)
`quence, which constitutes part of the virus body:
`
`ES 00000000
`call Oh
`pop ebx
`5B
`lea ecx, [ebx + 42h]
`SD 4B 42
`push ecx
`51
`push eax
`50
`push eax
`50
`OFOl 4C 24 FE sidt [esp - 02h]
`5B
`pop ebx
`83 C3
`add ebx,
`FA
`eli
`SB 2B
`mov ebp,
`
`lCh
`
`[ebx]
`
`lC
`
`oround on some common obfuscation techniques used
`by virus writers is given in Section 3. We also have de(cid:173)
`veloped an obfuscator for executables. Surprisingly, the
`three commercial virus scanners we considered could be
`easily thwarted by simple obfuscation transformations
`(Section 4). For example, in some cases the Norton an(cid:173)
`tivirus scanner could not even detect insertions of nop
`instructions.
`• A general architecture for detecting malicious pat(cid:173)
`terns in executables
`We introduce a general architecture for detecting mali(cid:173)
`cious patterns in executables. An overview of the archi(cid:173)
`tecture and its novel features is given in Section 5. Ex(cid:173)
`ternal predicates and uninterpreted symbols are two im(cid:173)
`portant elements in our architecture. External predicates
`are used to summarize results of various static analyses,
`such as points-to and live-range analysis. We allow these
`external predicates to be referred in the abstraction pat(cid:173)
`terns that describe the malicious code. Moreover, we
`allow uninterpreted symbols in patterns, which makes
`the method resistant to renaming, a common obfusca(cid:173)
`tion transformation. Two key compon~nts of our archi(cid:173)
`tecture, the program annotator and the malicious code
`detector, are described in Sections 6 and 7 respectively.
`• Prototype for x86 executables
`We have implemented a prototype for detecting mali(cid:173)
`cious patterns in x86 executables. The tool is called a
`~tatic Q_nalyzer for gxecutables or SAFE. We have suc(cid:173)
`cessfully tried S-AFE on multiple viruses; for brevity we
`report on our experience with four specific viruses. Ex(cid:173)
`perimental results (Section 8) demonstrate the efficacy
`of SAFE. There are several interesting directions we in(cid:173)
`tend to pursue as future work, which are summarized in
`Section 9.
`• Extensibility of analysis
`SAFE depends heavily on static analysis techniques. As
`a result, the precision of the tool directly depends on the
`static analysis techniques that are integrated into it. In
`other words, SAFE is as good as the static analysis-tech(cid:173)
`niques it is built upon. For example, if SAFE uses the
`result of points-to analysis, it will be able to track values
`across memory references. In the absence of a points(cid:173)
`to analyzer, SAFE makes the conservative assumption
`that a memory reference can access any memory loca(cid:173)
`tion (i.e. , everything points to everything). We have de(cid:173)
`signed SAFE so that various static analysis techniques
`can be readily integrated into it. Several simple static
`analysis techniques are already implementeGL.in SAFE.
`
`2 Related Work
`2.1 Theoretical Discussion
`The theoretical limits of malicious code detection
`(specifically of virus detection) have been the focus of
`many researchers. Cohen [ l 0] and Chess-White [9]
`
`This classic detection approach is effective when the
`virus code does not change significantly over time. De(cid:173)
`tection is also easier when viruses originate from the
`same source code, with only minor modifications and
`updates. Thus, a virus signature can be common to sev(cid:173)
`eral virus variants. For example, Chernobyl/CIH ver(cid:173)
`sions 1.2, 1.3, and 1.4 differ mainly in the trigger date
`on which the malicious code becomes actiVe and can be
`effectively detected by scanning for a single signature,
`namely the one shown above.
`The virus writers and the antivirus software develop(cid:173)
`ers are engaged in an obfuscation-deobfuscation game.
`Virus writers try to obfuscate the "vanilla" virus so that
`signatures used by the antivirus software cannot detect
`these "morphed" viruses. Therefore, to detect an obfus(cid:173)
`cated virus, the virus scanners first must undo the ob(cid:173)
`fuscation transformations used by the virus writers. In
`this game, virus writers are obfuscators and researchers
`working on malicious code detection are deobfuscators.
`A method to detect malicious code should be resistant
`to common obfuscation transformations. This paper in(cid:173)
`troduces such a method. The main contributions of this
`paper include:
`• The obfuscation-deobfuscation game and attacks
`on commercial virus scanners
`We view malicious code detection as an obfuscation(cid:173)
`deobfuscation game between the virus writers and the
`researchers working to detect malicious code. Back-
`
`170
`
`12th USENIX Security Symposium
`
`USENIX Association
`
`Blue Coat Systems - Exhibit 1010 Page 4
`
`
`
`showed that in general th e problem of virus detec(cid:173)
`tion is undecidable. Similarly, several important static
`analysis problems are undecidable or computationally
`hard [28, 35].
`However, the problem considered in this paper is
`slightly different than the one considered by Cohen [I 0]
`and Chess-White [9]. Assume that we are given a vanilla
`virus V which contains a malicious sequence of instruc(cid:173)
`tions 17. Next we are given an obfuscated version O(V )
`of the virus. The problem is to find whether there ex(cid:173)
`1 in O(V) which is "se(cid:173)
`ists a sequence of instructions 17
`mantically equivalent" to 17 . A recent result by Vadhan
`eta!. [3] proves that in general program obfuscation is
`impossible. This leads us to believe that a computation(cid:173)
`ally bounded adversary will not be able to obfuscate a
`virus to completely hide its malicious behavior. We will
`further explore these theoretical issues in the future.
`
`2.2 Other Detection Techniques
`Our work is closely related to previous results on
`static analysis techniques for verifying security proper(cid:173)
`ties of software [1, 4, 8, 7, 25, 29] . . In a larger con(cid:173)
`text, our work is similar to existing research on soft(cid:173)
`ware verification [2, 13]. However, there are several im(cid:173)
`portant differences. First, viewing malicious code de(cid:173)
`tection as. an obfuscation-deobfuscation game is unique.
`The obfuscation-deobfuscation viewpoint lead us to ex(cid:173)
`plore obfuscation attacks upon commercial virus scan(cid:173)
`ners. Second, to our knowledge,- all existing work on
`static analysis techniques fot verifying security proper(cid:173)
`ties analyze source code. On the other hand, our analy(cid:173)
`sis technique works on executables. In certain contexts,
`such as virus detection, source code is not available. Fi(cid:173)
`nally, we believe that using uninterpreted variables in the
`specification of the malicious code is unique (Section
`6.2).
`looked at the problem of automatically
`Currie et al.
`checking the equivalence of DSP routines in the context
`of verifying the correctness of optimizing transforma(cid:173)
`tions [ 15]. Their approach is similar to ours, but they
`impose a set of simplifying assumptions for their simula(cid:173)
`tion tool to execute with reasonable performance. Feng
`and Hu took this approach one step further by using a
`theorem prover to determine when to unroll loops [19].
`In both cases the scope of the problem is limited to
`VLIW or DSP code and there is limited support for user(cid:173)
`specified analyses. Our work is applied to x86 (IA-32)
`assembly and can take full advantage of static analyses
`available to the user of our SAFE tool. Necula adopts
`a similar approach based on comparing a transformed
`code sequence against the original code sequence in the
`setting of verifying the correctness of the GNU C com(cid:173)
`piler [38]. Using knowledge of the transformations per(cid:173)
`formed by the compiler, equivalence between the com-
`
`piler inpu t and the compil er output is proven using a
`simulation relation. In our work, we require no a pri(cid:173)
`ori knowledge of the obfuscation transformations per(cid:173)
`formed, as it would be unreali stic to expect such infor(cid:173)
`mation in the presence of malicious code.
`We plan to enhance our framework by using the ideas
`from existing work on type systems for assembly code.
`We are currently investigating Monisett et al. 's Typed
`Assembly Language [32, 33]. We apply a simple type
`system (Section 6) to the binaries we analyze by manu(cid:173)
`ally inserting the type annotations. We are unaware of
`a compiler that can produce Typed Assembly Language,
`and thus we plan to support external type annotations to
`enhance the power of our static analysis.
`Dynamic monitoring can also be used for malicious
`code detection. Cohen [ 1 0] and Chess-White [9] pro(cid:173)
`pose a virus detection model that executes code in a
`sandbox. Another approach rewrites the binary to in(cid:173)
`troduce checks driven by an enforceable security pol(cid:173)
`icy [ 17] (known as the inline reference monitor or the
`IRM approach). We believe static analysis can be used to
`improve the efficiency of dynamic analysis techniques,
`e.g. , static analysis can remove redundant checks in the
`IRM framework. We construct our models for exe(cid:173)
`cutables similar to the work done in specification-based
`monitoring [21, 46], and apply our detection algorithm
`in a context-insensitive fashion. Other research used
`context-sensitive analysis by employing push-down sys(cid:173)
`tems (PDSs). Analyses described in [7, 25] use the
`model checking algorithms for pushdown systems [I 8)
`to verify security properties of programs. The data struc(cid:173)
`tures used in interprocedural slicing [23], interprocedu(cid:173)
`ral DFA [40], and Boolean programs [2] are hierarchi(cid:173)
`cally structured graphs and can be translated to push(cid:173)
`down systems.
`
`2.3 Other Obfuscators
`While deciding on the initial obfuscation techniques
`to focus on, we were influenced by several existing tools.
`Mistfall (by zOmbie) is a library for binary obfuscation,
`specifically written to blend malicious code into a host
`program [49]. It can encrypt, morph, and blend the virus
`code into the host program. Our binary obfuscator is
`very similar to Mistfall. Unfortunately, we could not
`successfully morph binaries using Mistfall , so we could
`not perform .a direct comparison between our obfusca(cid:173)
`tor and Mistfall. Burneye (by TESO) is a Linux binary
`encapsulation tool. Burneye encrypts a binary (possibly
`multiple times) , and packages it into a new binary with
`an extraction tool [45]. In this paper, we have not con(cid:173)
`sidered encryption based obfuscation techniques. In the
`future, we will incorporate encryption based obfuscation
`techniques into our tool , by incorporating or extending
`existing libraries.
`
`USENIX Association
`
`12th USENIX Security Symposium
`
`171
`
`Blue Coat Systems - Exhibit 1010 Page 5
`
`
`
`3 Background on Obfuscating Viruses
`To detect obfuscated viruses, antivirus software have
`become more complex. This section discusses some
`common obfuscation transformations used by virus writ(cid:173)
`ers and how antivirus software have historically dealt
`with obfuscated viruses.
`A polymorphic virus uses multiple techniques to pre(cid:173)
`vent signature matching. First, the virus code is en(cid:173)
`crypted, and only a small in-clear routine is designed
`to decrypt the code before running the virus. \\'hen
`the polymorphic virus replicates itself by infecting an(cid:173)
`other program, it encrypts the virus body with a newly(cid:173)
`generated key, and it changes the decryption routine by
`generating new code for it. To obfuscate the decryption
`routine, several transformations are applied to it. These
`include: nap-insertion, code transposition (changing
`the order of instructions and placing jump instructions to
`maintain the original semantics), and register reassign(cid:173)
`ment (permuting the register allocation). These trans(cid:173)
`formations effectively change the virus signature (Fig(cid:173)
`ure I), inhibiting effective signature scanning by an an(cid:173)
`tivirus tool.
`The obfuscated code in Figure I will behave in the
`same manner as before since the nop imtruction has
`no effect other than incrementing the program counter1.
`However the signature has changed . Analysis can de(cid:173)
`tect simple obfuscations, like nop-insertion, by using
`regular expressions instead of fixed signatures. To catch
`nop insertions, the signaturt; should allow for any num(cid:173)
`ber of nops at instruction boundaries (Figure 2). In fact,
`most modern antivirus software use regular expressions
`as virus signatures.
`Antivirus software deals with polymorphic viruses
`by performing heuristic analyses of the code (such as
`checking only certain program locations for virus code,
`as most polymorphic viruses attach themselves only at
`the beginning or end of the executable binary [37]), and
`even emulating the program in a sandbox to catch the
`virus in action [36] . The emulation technique is effec(cid:173)
`tive because at some point during the execution of the
`infected program, the virus body appears decrypted in
`main memory, ready for execution; the detection comes
`down to frequently scanning the in-memory image of
`the program for virus signatures while the program exe(cid:173)
`cutes.
`Metamorphic viruses attempt to evade heuristic de(cid:173)
`tection techniques by using more complex obfuscations.
`When they replicate, these viruses change thei'r code in
`a variety of ways, such as code transposition, substi(cid:173)
`tution of equivalent instruction sequences, and register
`reassignment [44, 51]. Furthermore, they can "weave"
`the virus code into the host program, making detec(cid:173)
`tion by traditional heuristics almost impossible since the
`virus code is mixed with program code and the virus en-
`
`try point is no longer at the beginning of the program
`(these are designated as ~ntry point Qbscuring (EPO)
`viruses [26]).
`-
`As virus writers employ more complex obfusca(cid:173)
`tion techniques, heuristic virus-detection techniques are
`bound to fail. Therefore, there is need to pe1jorm a
`deeper analysis of malicious code based upon more so(cid:173)
`phisticated static-analysis techniques. In other words,
`inspection of the code to detect malicious patterns
`should use structures that are closer to the semantics of
`the code, as purely syntactic techniques, such as regular
`expression matching, are no longer adequate.
`
`3.1 The Suite of Viruses
`We have analy zed multiple viruses using our tool, and
`discuss four of them in this paper. Descriptions of these
`viruses are given below.
`
`3.1.1 Detailed Description of the Viruses
`Chernobyl (CIH)
`According to the Symantec Antivirus Reseach Cen(cid:173)
`ter (SARC), Chernobyl!CIH is a virus that infects 32-
`bit Windows 95/98/NT executable files [41]. When
`a user executes an infected program under Windows
`95/98/ME, the virus becomes resident in memory. Once
`the virus is resident, CIH infects other files when they
`are accessed. Infected files may have the same size as
`the original files because of CIH's unique mode of in(cid:173)
`fection: the virus searches for empty, unused spaces in
`the file 2 . Next it breaks itself up into smaller pieces and
`inserts its code into these unused spaces. Chernobyl has
`two different payloads : the first one overwrites the hard
`disk with random data, starting at the beginning of the
`disk (sector 0) using an infinite loop. The second pay(cid:173)
`load tries to cause permanent damage to the computer
`by conupting the Flash BIOS.
`zombie-6.b
`The z0mbie-6.b virus includes an interesting feature -
`the polymorphic engine hides every piece of the virus,
`and the virus code is added to the infected file as a chain
`of differently-sized routines, making standard signature
`detection techniques almost useless.
`fOsfOrO
`The jOsjOrO virus uses a polymorphic engine combined
`with an EPO technique to hide its entry point. According
`to Kaspersky Labs [27], when an infected file is run and
`the virus code gains control, it searches for portable ex(cid:173)
`ecutable files in the system directories and infects them .
`While infecting, the virus encrypts itself with a polymor(cid:173)
`phic loop and writes a result to the end of the file. To gain
`control when the infected file is run , the virus does not
`modify the program's start address, but instead writes a
`"jmp (virus _en t ry)" instruction into the middle of
`the file .
`
`172
`
`12th USENIX Security Symposium
`
`USENIX Associatior
`
`Blue Coat Systems - Exhibit 1010 Page 6
`
`
`
`Original code
`E8 00000000
`SB
`8D 4B 42
`51
`50
`50
`OF01 4C 24 FE
`5B
`83 C3 1C
`
`-
`
`8B 2B
`
`[ ebx +
`
`42h ]
`
`call Oh
`pop ebx
`l ea ecx,
`push ecx
`push eax
`push eax
`sidt [esp - 02h ]
`pop ebx
`add ebx, 1Ch
`eli
`rnov ebp,
`
`[ebx]
`
`Obfuscated code
`E8 00000000
`5B
`8D 4B 42
`90
`5 1
`50
`50
`90
`OF0 1 4C 24 FE
`5B
`83 C3 1C
`90
`FA
`8B 2B
`
`[ebx + 45h]
`
`ca ll Oh
`p op ebx
`lea ecx,
`nop
`push ecx
`push eax
`push eax
`nop
`sidt [esp - 02h]
`pop ebx
`add ebx, 1Ch
`nop
`eli
`rnov ebp,
`
`[ebx]
`
`Signature
`E8 00 0000 005 B 8D4B 4251 5050
`OF0 1 4C24 FE5B 83C3 1CFA 8B2B
`
`New signature
`E800 0000 005B 8D4B 4290 5l50
`50 9fr OF01 4C24 FESB 83C3 1C90
`FABB 2B
`------ - --
`-
`-·------- -----
`Figure I: Original code and obfuscated code from Chemobyl/CIH, and their corresponding signatures. Newly added
`instructions are highlighted.
`
`EBOO
`8D4B
`50(90}*
`5B(90)*
`8B2B
`
`0000
`42(90)*
`OF 01
`83C3
`
`5B(90)*
`00 (90) *
`51(90)* 50(90)*
`4C24
`FE ( 90)*
`1C(90}* FA(90)*
`
`cation techniques [11 , 12], those described here seem
`to be preferred by malicious code writers, possibly be(cid:173)
`cause implementing them is easy and they add little to
`the memory footprint.
`
`Figure 2: Extended signature to catch n op-insertion.
`
`4.1 Common Obfuscation Transformations
`
`Hare
`Finally, the Hare virus infects the bootloader sectors
`of floppy di sks and hard drives, as well as executable
`programs. When the payload is triggered, the virus
`overwrites random sectors on the hard disk, making the
`data inaccessible. The virus spreads by polymorphically
`changing its decryption routine and encrypting its main
`body.
`The H are and ChernobyVCIH viruses are well known
`in the antivirus community, with their presence in the
`wild peaking in 1996 and 1998, respectively. In spite
`of this, we discovered that current commercial virus
`scanners could Hot detect slightly obfuscated versions
`of these viruses.
`
`4 Obfuscation Attacks on Commercial
`Virus Scanners
`
`We tested three commercial virus scanners against
`several common obfuscation transformations . To test the
`resilience of commercial virus scanners to common ob(cid:173)
`fuscation transformation s, we have developed an obfus(cid:173)
`cator for binaries. Our obfuscator supports four com(cid:173)
`mon obfuscation transformations: dead-code insertion,
`code transposition, register reass ignment, and instruc(cid:173)
`tion substitution. While there are other generic obfus-
`
`4.1.1 Dead-Code Insertion
`
`Also known as trash insertion, dead-code insertion
`adds code to a program wi thout modifying its behav~
`ior. Inserting a sequence of n op instructions is the sim(cid:173)
`plest example. More interesting obfuscations involve
`constructing challenging code sequences that modify the
`program state , only to restore it immediately.
`Some code sequences are designed to fool antivirus
`software that solely rely on signature matching as their
`detection mechanism. Other code sequences are com(cid:173)
`plicated enough to make automatic analysis very time(cid:173)
`consuming, if not impossible. For example, passing val(cid:173)
`ues through memory rather than registers or the stack
`requires accurate pointer analysis to recover values. The
`example shown in Figure 3 should clarify thi s. The code
`marked by (*) can be easily eliminated by automated
`analysis. On the other hand, the second and third inser(cid:173)
`tions, marked by (**), do cancel out but the analysis is
`more complex. Our obfuscator supports dead-code in(cid:173)
`sertion.
`Not all dead-code sequence can be detected and elim(cid:173)
`inated, as this problem reduces to program equivalence
`(i.e., is this code sequence equiFalent to an empty pro(cid:173)
`gram ?), which is undecidable. We believe that many
`common dead-code sequences can be detected and elim(cid:173)
`inated with acceptable performance. To quote the docu-
`
`USENIX Association
`
`12th USENIX Security Symposium
`
`173
`
`Blue Coat Systems - Exhibit 1010 Page 7
`
`
`
`mentation of the RPME virus permutation engine [50],
`
`[T]rash [does not make the] program more
`complex [ ... ]. If [the] detecting algorithm will
`be written such as I think, then there is no
`difference between NOP and more complex
`trash.
`
`Our detection tool, SAFE, identifies several kinds of
`such dead-code segments.
`
`4.1.2 Code Transposition
`Code transposition shuffles the instructions so that the
`order in the binary image is different from the execu(cid:173)
`tion order, or from the order of instructions assumed in
`the signature used by the antivirus software. To achieve
`the first variation, we randomly reorder the instructions
`and insert unconditional branches or j umps to restore the
`original control-flow. The second variation swaps in(cid:173)
`structions if they are not interdependent, similar to com(cid:173)
`piler code generation, but with the different goal of ran(cid:173)
`domizing the instruction stream.
`The two versions of thi s obfuscation technique differ
`in their complexity. The code transposition technique
`based upon unconditional branches is relatively easy to
`implement. The second technique that interchanges in(cid:173)
`dependent instructions is more complicated because the
`independence of instructions must be ascertained. On
`the analysis side, code transposition can complicate mat(cid:173)
`ters only for a human. Most automatic analysi·s tools (in(cid:173)
`cluding ours) use an intermediate representation, such
`as the control flow graph (CFG) or the program depen(cid:173)
`dence graph (PDG) [23], that is not sensitive to super(cid:173)
`fluous changes in control flow. Note th at an optimizer
`acts as a deobfuscator in thi s case by finding the unnec(cid:173)
`essary unconditional branches and removing them from
`the program code. Currently, our obfuscator supports
`only code transposition based upon inserting uncondi(cid:173)
`tional branches.
`
`4.1.3 Register Reassignment
`The register reassignment transformation replaces us(cid:173)
`age of one register with another in a specific live range.
`This technique exchanges register names and has no
`other effect on program behavior. For example, if reg(cid:173)
`ister ebx is dead throughout a given live range of the
`register eax, it can replace eax in that live range. In
`certain cases, regi ster reassignment requires insertion of
`prologue and epilogue code around the live range to re(cid:173)
`store the state of various regi sters. Our binary obfuscator
`supports this code transformation.
`The purpose of thi s transformation is to subvert the
`antivirus software analyses that rely upon sig nature(cid:173)
`matching. There is no real obfuscatory value gained in
`this process. Conceptually, the deobfuscation challenge
`
`is equally complex before or after the regis ter reassign(cid:173)
`ment.
`
`Instruction Substitution
`4.1.4
`This obfuscation technique uses a dicti onary of equiv(cid:173)
`alent instruction sequences to replace one instruction
`sequence with another. Since this transfonnation re(cid:173)
`lies upon human knowledge of equivalent instructions, it
`poses the toughest challenge for automatic detection of
`malicious code. The IA-32 instruction set is especially
`rich, and provides several ways of performing the same
`operation. Coupled with several architecturally ambiva(cid:173)
`lent features (e.g., a memory-based stack that can be ac(cid:173)
`cessed both as a stack using dedicated instructions and
`as a memory area using standard memory operations),
`the IA-32 assembly language provides ample opportu(cid:173)
`nity for instruction substitution.
`To handle obfuscation based upon instruction substi(cid:173)
`tution , an analysis tool must maintain a dictionary of .
`equivalent instruction sequences, similar to the dictio(cid:173)
`nary used to generate them . This is not a comprehen(cid:173)
`sive solution, but it can cope with the common cases. In
`the case of IA-32, the problem can be slightly simplified
`by using a simple intermediate language that "unwinds"
`the complex operations corresponding to each IA-32 in(cid:173)
`struction . In some cases, a theorem prover such as Sim(cid:173)
`plify [ 16] or PVS [39] can also be used to prove that two
`sequences of instructions are equivalent.
`
`4.2 Testing Commercial Antivirus Tools
`We tested three commercial virus scanners using ob(cid:173)
`fuscated versions of the four viruses described earlier.
`The results were quite surprising: a combination of
`nap-insertion and code transposition was enough to
`create obfuscated versions of the viruses that the COI;I(cid:173)
`mercial virus scanners could not detect. Moreover, the
`Norton antivirus software could not detect an obfus(cid:173)
`cated version of the Chernobyl virus using just nap(cid:173)
`insertions. SAFE was resistant to the two obfuscation
`transformations. The results are summarized in Table l .
`A ./ indicates that the antivirus software detected the
`vi rus. A X means that the software did not detect the
`virus. Note that unobfuscated versions of all four viruses
`were detected by all the tools .
`
`5 Architecture
`This section gives an overview of the architecture of
`SAFE (Figure 4). Subsequent sections provide detailed
`descriptions of the major components of SAFE.
`To detect malicious patterns in executables, we build
`an abstract representation of the malicio