throbber
Limits of Static Analysis for Malware Detection
`
`Andreas Moser, Christopher Kruegel, and Engin Kirda
`Secure Systems Lab
`Technical University Vienna
`{andy,chris,ek}@seclab.tuwien.ac.at
`
`Abstract
`
`Malicious code is an increasingly important problem
`that threatens the security of computer systems. The tradi-
`tional line of defense against malware is composed of mal-
`ware detectors such as virus and spyware scanners. Un-
`fortunately, both researchers and malware authors have
`demonstrated that these scanners, which use pattern match-
`ing to identify malware, can be easily evaded by simple code
`transformations. To address this shortcoming, more pow-
`erful malware detectors have been proposed. These tools
`rely on semantic signatures and employ static analysis tech-
`niques such as model checking and theorem proving to per-
`form detection. While it has been shown that these systems
`are highly effective in identifying current malware, it is less
`clear how successful they would be against adversaries that
`take into account the novel detection mechanisms.
`The goal of this paper is to explore the limits of static
`analysis for the detection of malicious code. To this end,
`we present a binary obfuscation scheme that relies on the
`idea of opaque constants, which are primitives that allow
`us to load a constant into a register such that an analysis
`tool cannot determine its value. Based on opaque constants,
`we build obfuscation transformations that obscure program
`control flow, disguise access to local and global variables,
`and interrupt tracking of values held in processor registers.
`Using our proposed obfuscation approach, we were able
`to show that advanced semantics-based malware detectors
`can be evaded. Moreover, our opaque constant primitive
`can be applied in a way such that is provably hard to an-
`alyze for any static code analyzer. This demonstrates that
`static analysis techniques alone might no longer be suffi-
`cient to identify malware.
`
`1
`
`Introduction
`
`Malicious code (or malware) is defined as software that
`fulfills the harmful intent of an attacker. The damage caused
`by malware has dramatically increased in the past few
`
`years [8]. One reason is the rising popularity of the Internet
`and the resulting increase in the number of available vulner-
`able machines because of security-unaware users. Another
`reason is the elevated sophistication of the malicious code
`itself.
`Current systems to detect malicious code (most promi-
`nently, virus scanners) are largely based on syntactic signa-
`tures. That is, these systems are equipped with a database
`of regular expressions that specify byte or instruction se-
`quences that are considered malicious. A program is de-
`clared malware when one of the signatures is identified in
`the program’s code.
`Recent work [2] has demonstrated that techniques such
`as polymorphism and metamorphism are successful in evad-
`ing commercial virus scanners. The reason is that syntactic
`signatures are ignorant of the semantics of instructions. To
`address this problem, a novel class of semantics-aware mal-
`ware detectors was proposed. These detectors [3, 10, 11]
`operate with abstract models, or templates, that describe the
`behavior of malicious code. Because the syntactic prop-
`erties of code are (largely) ignored, these techniques are
`(mostly) resilient against the evasion attempts discussed
`above. The premise of semantics-aware malware detectors
`is that semantic properties are more difficult to morph in an
`automated fashion than syntactic properties. While this is
`most likely true, the extent to which this is more difficult is
`less obvious. On one hand, semantics-aware detection faces
`the challenge that the problem of deciding whether a certain
`piece of code exhibits a certain behavior is undecidable in
`the general case. On the other hand, it is also not trivial for
`an attacker to automatically generate semantically equiva-
`lent code.
`The question that we address in this paper is the follow-
`ing: How difficult is it for an attacker to evade semantics-
`based malware detectors that use powerful static analysis to
`identify malicious code? We try to answer this question by
`introducing a binary code obfuscation technique that makes
`it difficult for an advanced, semantics-based malware de-
`tector to properly determine the effect of a piece of code.
`For this obfuscation process, we use a primitive known as
`
`1
`
`Blue Coat Systems - Exhibit 1056
`
`

`
`opaque constant, which denotes a code sequence to load a
`constant into a processor register whose value cannot be de-
`termined statically. Based on opaque constants, we build a
`number of obfuscation transformations that are difficult to
`analyze statically.
`Given our obfuscation scheme, the next question that
`needs to be addressed is how these transformations should
`be applied to a program. The easiest way, and the approach
`chosen by most previous obfuscation approaches [6, 20], is
`to work on the program’s source code. Applying obfusca-
`tion at the source code level is the normal choice when the
`distributor of a binary controls the source (e.g., to protect
`intellectual property). For malware that is spreading in the
`wild, source code is typically not available. Also, malware
`authors are often reluctant to revealing their source code to
`make analysis more difficult. Thus, to guard against objec-
`tions that our presented threats are unrealistic, we present a
`solution that operates directly on binaries.
`The core contributions of our paper are as follows:
`• We present a binary obfuscation scheme based on the
`idea of opaque constants. This scheme allows us to
`demonstrate that static analysis of advanced malware
`detectors can be thwarted by scrambling control flow
`and hiding data locations and usage.
`• We introduce a binary rewriting tool that allows us
`to obfuscate Windows and Linux binary programs for
`which no source code or debug information is avail-
`able.
`• We present experimental results that demonstrate that
`semantics-aware malware detectors can be evaded suc-
`cessfully. In addition, we show that our binary trans-
`formations are robust, allowing us to run real-world
`obfuscated binaries under both Linux and Windows.
`
`The code obfuscation scheme introduced in this paper
`provides a strong indication that static analysis alone might
`not be sufficient to detect malicious code. In particular, we
`introduce an obfuscation scheme that is provably hard to an-
`alyze statically. Because of the many ways in which code
`can be obfuscated and the fundamental limits in what can
`be decided statically, we firmly believe that dynamic analy-
`sis is a necessary complement to static detection techniques.
`The reason is that dynamic techniques can monitor the in-
`structions that are actually executed by a program and thus,
`are immune to many code obfuscating transformations.
`
`2 Code Obfuscation
`
`In this section, we present the concepts of the transfor-
`mations that we apply to make the code of a binary difficult
`to analyze statically. As with most obfuscation approaches,
`
`the basic idea behind our transformations is that either some
`instructions of the original code are replaced by program
`fragments that are semantically equivalent but more diffi-
`cult to analyze, or that additional instructions are added to
`the program that do not change its behavior.
`
`2.1 Opaque Constants
`
`Constant values are ubiquitous in binary code, be it as the
`target of a control flow instruction, the address of a variable,
`or an immediate operand of an arithmetic instruction. In its
`simplest form, a constant is loaded into a register (expressed
`by a move constant, $register instruction). An im-
`portant obfuscation technique that we present in this paper
`is based on the idea of replacing this load operation with a
`set of semantically equivalent instructions that are difficult
`to analyze statically. That is, we generate a code sequence
`that always produces the same result (i.e., a given constant),
`although this fact would be difficult to detect from static
`analysis.
`
`Figure 1. Opaque constant calculation
`
`Simple Opaque Constant Calculation Figure 1 shows
`one approach to create a code sequence that makes use
`of random input and different intermediate variable values
`on different branches.
`In this code sequence, the value
`unknown is a random value loaded during runtime. To
`prepare the opaque constant calculation, the bits of the con-
`stant that we aim to create have to be randomly partitioned
`into two groups. The values of the arrays zero and one
`are crafted such that after the for loop, all bits of the first
`group have the correct, final value, while those of the sec-
`ond group depend on the random input (and thus, are un-
`known). Then, using the appropriate values for set ones
`and set zeros, all bits of the second group are forced to
`
`2
`
`i
`i
`i
`i
`
`f
`
`c
`c
`
`}
`
`n
`n
`n
`n
`
`o
`
`o
`o
`
`t
`t
`t
`t
`
`r
`i
`e
`
`n
`n
`
`f
`l
`
`s
`s
`
`(
`
`z
`
`u
`
`o
`c
`c
`c
`t
`t
`
`s
`
`e
`n
`n
`o
`i
`o
`o
`a
`a
`
`e
`
`(
`
`b
`
`r
`e
`k
`n
`n
`n
`n
`n
`
`[
`
`o
`n
`s
`=
`i
`s
`s
`t
`t
`
`[
`
`3
`o
`t
`t
`t
`t
`
`w
`_
`
`3
`2
`a
`0
`a
`a
`=
`=
`
`]
`;
`
`2
`n
`n
`a
`n
`n
`
`]
`
`t
`t
`t
`t
`c
`c
`
`_
`
`=
`i
`o
`o
`
`p
`
`=
`=
`=
`=
`=
`n
`n
`
`l
`<
`o
`s
`s
`
`3
`3
`f
`o
`t
`t
`
`{
`{
`
`o
`0
`s
`c
`c
`t
`t
`
`;
`
`a
`3
`i
`o
`o
`a
`a
`
`z
`
`o
`d
`2
`t
`n
`n
`n
`n
`
`_
`_
`_
`
`;
`
`i
`s
`s
`t
`t
`
`1
`1
`r
`+
`n
`a
`a
`
`o
`a
`
`,
`,
`
`o
`+
`n
`n
`
`r
`n
`
`(
`
`m
`u
`
`i
`t
`t
`
`d
`
`z
`)
`
`_
`
`o
`n
`
`_
`_
`
`r
`k
`s
`s
`
`x
`x
`
`{
`
`3
`3
`a
`n
`e
`e
`
`o
`o
`
`0
`0
`n
`o
`t
`t
`
`r
`r
`
`,
`,
`
`d
`
`w
`_
`_
`
`.
`.
`
`m
`
`,
`
`e
`n
`n
`e
`
`.
`.
`
`_
`
`r
`e
`e
`r
`
`.
`.
`
`[
`
`a
`i
`o
`s
`o
`
`[
`
`d
`i
`s
`
`)
`;
`
`d
`i
`
`,
`,
`]
`;
`
`o
`n
`o
`o
`
`z
`z
`
`r
`=
`
`]
`;
`
`}
`}
`
`)
`
`;
`;
`;
`
`(
`
`)
`
`0
`0
`s
`0
`
`_
`_
`
`s
`
`o
`e
`=
`
`z
`;
`
`

`
`sults demonstrate that the simple opaque constant calcula-
`tion is already sufficient to thwart current malware detec-
`tors. However, we also explored the design space of opaque
`constants to identify primitives for which stronger guaran-
`tees with regard to robustness against static analysis can be
`provided. In the following paragraphs, we discuss a prim-
`itive that relies on the NP-hardness of the 3-satisfiability
`problem.
`
`NP-Hard Opaque Constant Calculation The idea of the
`following opaque constant is that we encode the instance of
`an NP-hard problem into a code sequence that calculates
`our desired constant. That is, we create an opaque constant
`such that the generation of an algorithm to precisely deter-
`mine the result of the code sequence would be equivalent to
`finding an algorithm to solve an NP-hard problem. For our
`primitive, we have chosen the 3-satisfiability problem (typ-
`ically abbreviated as 3SAT) as a problem that is known to
`be hard to solve. The 3SAT problem is a decision problem
`where a formula in Boolean logic is given in the following
`form:
`
`(cid:86)n
`i=1(Vi1 ∨ Vi2 ∨ Vi3)
`where Vij ∈ {v1, ..., vm} and v1, ..., vm are Boolean vari-
`ables whose value can be either true or false. The task is
`now to determine if there exists an assignment for the vari-
`ables vk such that the given formula is satisfied (i.e., the
`formula evaluates to true). 3SAT has been proven to be NP-
`complete in [9].
`Consider the code sequence in Figure 2. In this primi-
`tive, we define m boolean variables v1 . . . vm, which corre-
`spond directly to the variables in the given 3SAT formula.
`By v1 . . . vm, we denote their negations. The pointers V11
`to Vn3 refer to the variables used in the various clauses of
`the formula. In other words, the pointers V11 to Vn3 encode
`a 3SAT problem based on the variables v1 . . . vm. The loop
`simply evaluates the encoded 3SAT formula on the input.
`If the assignment of variables v1 . . . vm does not satisfy the
`formula, there will always be at least one clause i that evalu-
`ates to false. When the check in the loop is evaluated for that
`specific clause, the result will always be true (as the check
`is performed against the negate of the clause). Therefore,
`the opaque constant will be set to 0. On the other hand, if
`the assignment satisfies the encoded formula, the check per-
`formed in the loop will never be true. Therefore, the value
`of the opaque constant is not overwritten and remains 1.
`In the opaque constant presented in Figure 2, the 3SAT
`problem (that is, the pointers V11 to Vn3) is prepared by the
`obfuscator. However, the actual assignment of boolean val-
`ues to the variables v1 . . . vm is randomly performed during
`runtime. Therefore, the analyzer cannot immediately evalu-
`ate the formula. The trick of our opaque constant is that the
`
`Figure 2. Opaque constant based on 3SAT
`
`their correct values (while those of the first group are left
`unchanged). The result is that all bits of constant hold
`the desired value at the end of the execution of the code.
`An important question is how the arrays zero and one
`can be prepared such that all bits of the first group are guar-
`anteed to hold their correct value. This can be accom-
`plished by ensuring that, for each i, all bits that belong
`to the first group have the same value for the two array
`elements zero[i] and one[i]. Thus, independent of
`whether zero[i] or one[i] is used in the xor opera-
`tion with constant, the values of all bits in the first group
`are known after each loop iteration. Of course, the bits
`that belong to the second group can be randomly chosen
`for all elements zero[i] and one[i]. Thus, the value
`of constant itself is different after each loop iteration.
`Because a static analyzer cannot determine the exact path
`that will be chosen during execution, the number of pos-
`sible constant values doubles after each loop iteration. In
`such a case, the static analyzer would likely have to resort
`to approximation, in which case the exact knowledge of the
`constant is lost.
`This problem could be addressed for example by intro-
`ducing a more complex encoding for the constant. If we
`use for instance the relationship between two bits to repre-
`sent one bit of actual information, we avoid the problem that
`single bits have the same value on every path. In this case,
`off-the-shelf static analyzers can no longer track the precise
`value of any variable.
`Of course, given the knowledge of our scheme, the de-
`fender has always the option to adapt the analysis such that
`the used encoding is taken into account. Similar to be-
`fore, it would be possible to keep the exact values for those
`variables that encode the same value after each loop itera-
`tion. However, this would require special treatment of the
`particular encoding scheme in use. Our experimental re-
`
`3
`
`.
`
`b
`b
`b
`
`c
`
`f
`
`.
`
`o
`o
`o
`
`o
`
`o
`
`.
`
`o
`o
`o
`
`n
`
`r
`i
`
`l
`l
`l
`
`s
`
`f
`
`n
`n
`n
`
`n
`n
`
`(
`
`t
`=
`*
`s
`
`e
`e
`e
`
`t
`c
`
`(
`
`a
`a
`a
`
`a
`i
`o
`
`!
`
`v
`V
`
`*
`*
`t
`
`V
`V
`
`1
`=
`0
`i
`a
`
`,
`;
`
`1
`n
`1
`n
`
`.
`
`,
`,
`;
`
`i
`
`.
`
`&
`
`=
`
`.
`
`*
`*
`<
`
`&
`
`,
`
`V
`V
`
`0
`
`1
`1
`1
`t
`
`)
`
`v
`
`2
`2
`
`;
`
`(
`
`m
`
`,
`,
`
`*
`
`,
`
`+
`
`V
`
`1
`n
`n
`
`!
`
`;
`
`,
`
`3
`3
`
`_
`
`v
`V
`V
`
`i
`2
`
`_
`
`1
`1
`n
`
`)
`)
`
`*
`*
`+
`i
`
`.
`
`.
`
`&
`
`;
`;
`&
`
`,
`
`(
`
`*
`
`_
`
`v
`V
`
`.
`!
`
`_
`
`m
`
`i
`
`;
`
`3
`
`)
`
`

`
`3SAT problem is prepared such that the formula is not sat-
`isfiable. Thus, independent of the actual input, the constant
`will always evaluate to 0. Of course, when a constant value
`of 1 should be generated, we can simply invert the result of
`the satisfiability test. Note that it is possible to efficiently
`generate 3SAT instances that are not satisfiable with a high
`probability [16]. A static analyzer that aims to exactly de-
`termine the possible values of our opaque constant has to
`solve the instance of the 3SAT problem. Thus, 3SAT is re-
`ducible in polynomial time to the problem of exact static
`analysis of the value of the given opaque constant.
`Note that the method presented above only generates one
`bit of opaque information but can be easily extended to cre-
`ate arbitrarily long constants.
`
`Basic Block Chaining One practical drawback of the
`3SAT primitive presented above is that its output has to be
`the same for all executions, regardless of the actual input.
`As a result, one can conceive an analysis technique that
`evaluates the opaque constant function for a few concrete
`inputs. When all output values are equal, one can assume
`that this output is the opaque value encoded. To counter this
`analysis, we introduce a method that we denote basic block
`chaining.
`the input for the 3SAT
`With basic block chaining,
`problems is not always selected randomly during runtime.
`Moreover, we do not always generate unsatisfiable 3SAT
`instances, but occasionally insert also satisfiable instances.
`In addition, we ensure that the input that solves a satisfiable
`formula is provided during runtime. To this end, the input
`variables v1 . . . vm to the various 3SAT formulas are real-
`ized as global variables. At the end of every basic block,
`these global variables are set in one of the three following
`ways: (1) to static random values, (2) to random values gen-
`erated at runtime, or (3), to values specially crafted such that
`they satisfy a solvable formula used to calculate the opaque
`constant in the next basic block in the control flow graph.
`To analyze a program that is obfuscated with basic block
`chaining, the analyzer cannot rely on the fact that the en-
`coded formula is always unsatisfiable. Also, when ran-
`domly executing a few sample inputs, it is unlikely that
`the analyzer chooses values that solve a satisfiable formula.
`The only way to dissect an opaque constant would be to
`first identify the basic block(s) that precede a certain for-
`mula and then determine whether the input values stored in
`this block satisfy the 3SAT problem. However, finding these
`blocks is not trivial, as the control flow of the program is ob-
`fuscated to make this task difficult (see the following Sec-
`tion 2.2 for more details). Thus, the analysis would have
`to start at the program entry point and either execute the
`program dynamically or resort to an approach similar to
`whole program simulation in which different branches are
`followed from the start, resolving opaque constants as the
`
`analysis progresses. Obviously, our obfuscation techniques
`fail against such methods, and indeed, this is consistent with
`an important point that we intend to make in this paper: dy-
`namic analysis techniques are a promising and powerful ap-
`proach to deal with obfuscated binaries.
`
`2.2 Obfuscating Transformations
`
`Using opaque constants, we possess a mechanism to load
`a constant value into a register without the static analyzer
`knowing its value. This mechanism can be expanded to per-
`form a number of transformations that obfuscate the control
`flow, data locations, and data usage of a program.
`
`2.2.1 Control Flow Obfuscation
`
`A central prerequisite for the ability to carry out advanced
`program analysis is the availability of a control flow graph.
`A Control Flow Graph (CFG) is defined as a directed graph
`G = (V, E) in which the vertices u, v ∈ V represent basic
`blocks and an edge e ∈ E : u → v represents a possible
`flow of control from u to v. A basic block describes a se-
`quence of instructions without any jumps or jump targets in
`the middle. More formally, a basic block is defined as a se-
`quence of instructions where the instruction in each position
`dominates, or always executes before, all those in later po-
`sitions. Furthermore, no other instruction executes between
`two instructions in the same sequence. Directed edges be-
`tween blocks represent jumps in the control flow, which are
`caused by control transfer instructions (CTI) such as calls,
`conditional jumps, and unconditional jumps.
`The idea to obfuscate the control flow is to replace un-
`conditional jump and call instructions with a sequence of
`instructions that do not alter the control flow, but make it
`difficult to determine the target of control transfer instruc-
`tions.
`In other words, we attempt to make it as difficult
`as possible for an analysis tool to identify the edges in the
`control flow graph. Jump and call instructions exist as di-
`rect and indirect variants. In case of a direct control trans-
`fer instruction, the target address is provided as a constant
`operand. To obfuscate such an instruction, it is replaced
`with a code sequence that does not immediately reveal the
`value of the jump target to an analyst. To this end, the sub-
`stituted code first calculates the desired target address using
`an opaque constant. Then, this value is saved on the stack
`(along with a return address, in case the substituted instruc-
`tion was a call). Finally, a x86 ret(urn) operation is
`performed, which transfers control to the address stored on
`top of the stack (i.e., the address that is pointed to by the
`stack pointer). Because the target address was previously
`pushed there, this instruction is equivalent to the original
`jump or call operation.
`Typically, this measure is enough to effectively avoid the
`reconstruction of the CFG. In addition, we can also use ob-
`
`4
`
`

`
`fuscation for the return address. When we apply this more
`complex variant to calls, they become practically indistin-
`guishable from jumps, which makes the analysis of the re-
`sulting binary even harder because calls are often treated
`differently during analysis.
`
`2.2.2 Data Location Obfuscation
`
`The location of a data element is often specified by provid-
`ing a constant, absolute address or a constant offset relative
`to a particular register. In both cases, the task of a static an-
`alyzer can be complicated if the actual data element that is
`accessed is hidden.
`When accessing a global data element, the compiler typ-
`ically generates an operation that uses the constant address
`of this element. To obfuscate this access, we first generate
`code that uses an opaque constant to store the element’s ad-
`dress in a register. In a second step, the original operation
`is replaced by an equivalent one that uses the address in
`the register instead of directly addressing the data element.
`Accesses to local variables can be obfuscated in a similar
`fashion. Local variable access is typically achieved by us-
`ing a constant offset that is added to the value of the base
`pointer register, or by subtracting a constant offset from the
`stack pointer. In both cases, this offset can be loaded into
`a register by means of an opaque constant primitive. Then,
`the now unknown value (from the point of view of the static
`analyzer) is used as offset to the base or stack pointer.
`Another opportunity to apply data location obfuscation
`are indirect function calls and indirect jumps. Modern op-
`erating systems make heavy use of the concept of dynami-
`cally linked libraries. With dynamically linked libraries, a
`program specifies a set of library functions that are required
`during execution. At program start-up, the dynamic linker
`maps these requested functions into the address space of the
`running process. The linker then populates a table (called
`import table or procedure linkage table) with the addresses
`of the loaded functions. The only thing a program has to
`do to access a library function during runtime is to jump to
`the corresponding address stored in the import table. This
`“jump” is typically realized as an indirect function call in
`which the actual target address of the library routine is taken
`from a statically known address, which corresponds to the
`appropriate table entry for this function.
`Because the address of the import table entry is encoded
`as a constant in the program code, dynamic library calls
`yield information on what library functions a program ac-
`tively uses. Furthermore, such calls also reveal the impor-
`tant information of where these functions are called from.
`Therefore, we decided to obfuscate import table entry ad-
`dresses as well. To this end, the import table entry address
`is first loaded into a register using an opaque constant. After
`this step, a register-indirect call is performed.
`
`2.2.3 Data Usage Obfuscation
`
`With data location obfuscation, we can obfuscate memory
`access to local and global variables. However, once values
`are loaded into processor registers, they can be precisely
`tracked. For example, when a function returns a value, this
`return value is typically passed through a register. When the
`value remains in the register and is later used as an argument
`to another function call, the static analyzer can establish this
`relationship. The problem from the point of view of the
`obfuscator is that a static analysis tool can identify define-
`use-chains for values in registers. That is, the analyzer can
`identify when a value is loaded into a register and when it
`is used later.
`To make the identification of define-use chains more dif-
`ficult, we obfuscate the presence of values in registers. To
`this end, we insert code that temporarily spills register con-
`tent to an obfuscated memory location and later reloads it.
`This task is accomplished by first calculating the address of
`a temporary storage location in memory using an opaque
`constant. We then save the register to that memory location
`and delete its content. Some time later, before the content of
`the register is needed again, we use another opaque constant
`primitive to construct the same address and reload the regis-
`ter. For this process, unused sections of the stack are chosen
`as temporary storage locations for spilled register values.
`After this obfuscation mechanism is applied, a static
`analysis can only identify two unrelated memory accesses.
`Thus, this approach effectively introduces the uncertainty of
`memory access to values held in registers.
`
`3 Binary Transformation
`
`To verify the effectiveness and robustness of the pre-
`sented code obfuscation methods on real-world binaries,
`it was necessary to implement a binary rewriting tool that
`is capable of changing the code of arbitrary binaries with-
`out assuming access to source code or program information
`(such as relocation or debug information).
`We did consider implementing our obfuscation tech-
`niques as part of the compiler tool-chain. This task would
`have been easier than rewriting existing binaries, as the
`compiler has full knowledge about the code and data com-
`ponents of a program and could insert obfuscation prim-
`itives during code generation. Unfortunately, using a
`compiler-based approach would have meant that it would
`not have been possible to apply our code transformations to
`real-world malware (except the few for which source code
`is available on the net). Also, the ability to carry out trans-
`formations directly on binary programs highlights the threat
`that code obfuscation techniques pose to static analyzers.
`When a modified compiler is required for obfuscation, a
`typical argument that is brought forward is that the threat
`
`5
`
`

`
`is hypothetical because it is difficult to bundle a complete
`compiler with a malware program. In contrast, shipping a
`small binary rewriting engine together with malicious code
`is more feasible for miscreants.
`When we apply the transformations presented in this
`paper to a binary program, the structure of the program
`changes significantly. This is because the code that is be-
`ing rewritten requires a larger number of instructions after
`obfuscation, as single instructions get substituted by obfus-
`cation primitives. To make room for the new instructions,
`the existing code section is expanded and instructions are
`shifted. This has important consequences. First, instruc-
`tions that are targets of jump or call operations are relocated.
`As a result, the operands of the corresponding jump and call
`instructions need to be updated to point to these new ad-
`dresses. Note that this also effects relative jumps, which do
`not specify a complete target address, but only an offset rel-
`ative to the current address. Second, when expanding the
`code section, the adjacent data section has to be moved too.
`Unfortunately for the obfuscator, the data section often con-
`tains complex data structures that define pointers that refer
`to other locations inside the data section. All these pointers
`need to be adjusted as well.
`Before instructions and their operands can be updated,
`they need to be identified. At first glance, this might sound
`straightforward. However, this is not the case because the
`variable length of the x86 instruction set and the fact that
`code and data elements are mixed in the code section make
`perfect disassembly a difficult challenge.
`In our system, we use a recursive traversal disassembler.
`That is, we start by disassembling the program at the pro-
`gram entry point specified in the program header. We disas-
`semble the code recursively until every reachable procedure
`has been processed. After that, we focus on the remaining
`unknown sections. For these, we use a number of heuris-
`tics to recognize them as possible code. These heuristics
`include the use of byte signatures to identify function pro-
`logues or jump tables. Whenever a code region is identified,
`the recursive disassembler is restarted there. Otherwise, the
`section is declared as data.
`Our rewriting tool targets both the Linux ELF and the
`Windows PE file formats. Using the recursive disassembler
`approach and our heuristics, our binary rewriting tool is able
`to correctly obfuscate many (although not all) real-world
`binaries. More detailed results on the robustness of the tool
`are provided in Section 4.
`
`4 Evaluation
`
`In this section, we present experimental results and dis-
`cuss our experiences with our obfuscation tool. In particu-
`lar, we assess how effective the proposed obfuscation tech-
`niques are in evading malware detectors. In addition, we
`
`analyze the robustness of our binary rewriting tool by pro-
`cessing a large number of Linux and Windows applications.
`
`4.1 Evasion Capabilities
`
`To demonstrate that the presented obfuscation methods
`can be used to effectively change the structure of a binary
`so that static analysis tools fail to recognize the obfuscated
`code, we conducted tests with real-world malware. We used
`our tool to morph three worm programs and then analyzed
`the obfuscated binaries using an advanced static analysis
`tool [10] as well as four popular commercial virus scanners.
`The malware samples that we selected for our experi-
`ments were the A and F variants of the MyDoom worm and
`the A variant of the Klez worm. We chose these samples
`because they were used in the evaluation of the advanced
`static analysis tool in [10]. Thus, the tool was equipped with
`appropriate malware specifications to detect these worms.
`In order to obfuscate the malicious executables, we de-
`ployed the evasion techniques introduced in Section 2 using
`both the simple opaque constants and the one based on the
`3SAT problem.
`
`Commercial Virus Scanners: First, we tested the pos-
`sibilities to evade detection by popular virus scanners. To
`evaluate the effectiveness of our obfuscation methods, we
`selected the following four popular anti-virus applications:
`McAfee Anti-Virus, Kaspersky Anti-Virus Personal, An-
`tiVir Personal Edition, and Ikarus Virus Utilities.
`Before the experiment, we verified that all scanners cor-
`rectly identified the worms. Then, we obfuscated the three
`malicious code samples, ensured that the malware was still
`operating correctly, and ran the virus scanners on them. The
`results are shown in Table 1. In this table, an “X” indicates
`that the scanner was no longer able to detect the malware.
`
`McAfee
`Kaspersky
`AntiVir
`Ikarus
`
`Klez.A MyDoom.A MyDoom.AF
`X
`X
`
`X
`
`X
`
`X
`
`X
`X
`X
`
`Table 1. Evasion results for four commercial
`virus scanners
`
`The results demonstrate that after the obfuscation pro-
`cess, the scanners from Kaspersky and Ikarus were not able
`to detect any of the malware instances. Surprisingly for us,
`however, the scanners from McAfee and AntiVir were still
`able to detect two out of three worms. Closer examination
`revealed that the scanner from McAfee detects the two ob-
`fuscated samples because of a virus signature that is based
`
`6
`
`

`
`on parts of the data section. When we overwrote the bytes
`in the data section that were being used as a signature, the
`McAfee scanner could neither detect the original nor the
`obfuscated version of the malware anymore.
`In contrast,
`the AntiVir scanner uses a combination of both a data and a
`code signature to detect the worms. We were able to track
`down the data signature for both Klez.A and MyDoom.A
`to a few bytes in the data section. If any of these bytes in
`the data section was modified in the obfuscated binary, the
`detection by the virus scanner was successfully evaded. In-
`deed, it is relatively easy for malicious code to encrypt the
`data section using a different key for each instance. Hence,
`data signatures are not too difficult to evade.
`
`Advanced Malware Detection (Model Checking): Be-
`cause it is widely known that existing commercial virus
`scanners typically employ pattern-based signatures,
`the
`ability to evade their detection is not too surprising. In or-
`der to verify the efficiency of our obfuscation techniques
`on a more advanced malware detector, we obtained the sys-
`tem presented in [10] from its authors. This detector first
`creates a disassembly of the binary under analysis by us-
`ing IDA Pro [7]. Then, model checking is used to search
`for the existence of a generic code template that character-
`izes malicious behavior. In particular, the tool attempts t

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket