IPR2016-01443, No. 1022-21 Exhibit - Exhibit 1022 Cifuentes et al (P.T.A.B. Jul. 15, 2016)

SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 25(7), 811–829 (JULY 1995)
`
`Decompilation of Binary Programs
`
`AND K. JOHN GOUGH
`CRISTINA CIFUENTES
`(email: cifuente@ﬁt.qut.edu.au gough@ﬁt.qut.edu.au)
`School of Computing Science, Queensland University of Technology, GPO Box 2434, Brisbane QLD 4001,
`Australia
`
`∗
`
`SUMMARY
`
`The structure of a decompiler is presented, along with a thorough description of the different modules that
`form part of a decompiler, and the type of analyses that are performed on the machine code to regenerate
`high-level language code. The phases of the decompiler have been grouped into three main modules:
`front-end, universal decompiling machine, and back-end. The front-end is a machine-dependent module
`that performs the loading, parsing and semantic analysis of the input program, as well as generating
`an intermediate representation of the program. The universal decompiling machine is a machine- and
`language-independent module that performs data and control ﬂow analysis of the program based on the
`intermediate representation, and the program’s control ﬂow graph. The back-end is a language-dependent
`module that deals with the details of the target high-level language.
`In order to increase the readability of the generated programs, a decompiling system has been imple-
`mented which integrates a decompiler, dcc, and an automatic signature generator, dccSign. Signatures
`for libraries and compilers are stored in a database that is read by the decompiler; thus, the generated
`programs can make use of known library names, such as (cid:87)(cid:114)(cid:105)(cid:116)(cid:101)(cid:76)(cid:110)(cid:40)(cid:41) and (cid:112)(cid:114)(cid:105)(cid:110)(cid:116)(cid:102)(cid:40)(cid:41).
`dcc is a decompiler for the Intel 80286 architecture and the DOS operating system. dcc takes as input
`binary programs from a DOS environment and generates C programs as output. Sample code produced
`by this decompiler is given.
`
`KEY WORDS: decompiler; reverse compiler; compiler signature; library signature; i80286; C language
`
`INTRODUCTION
`A decompiler, or reverse compiler, is a program that attempts to perform the inverse process
`of the compiler: given an executable program compiled in any high-level language, the aim is
`to produce a high-level language program that performs the same function as the executable
`program. Thus, the input is machine dependent, and the output is language dependent.
`Several practical problems are faced when writing a decompiler. The main problem
`derives from the representation of data and instructions in the Von Neumann architecture:
`they are indistinguishable. Thus, data can be located in between instructions, such as many
`implementations of indexed jump ((cid:99)(cid:97)(cid:115)(cid:101)) tables. This representation and self-modifying
`code practices makes it hard to decompile a binary program.
`Another problem is the great number of subroutines introduced by the compiler and the
`linker. The compiler will always include start-up subroutines that set up its environment,
`∗
`
`Present address: Department of Computer Science, University of Tasmania, GPO Box 252C, Hobart TAS 7001, Australia.
`(Email: C.N.Cifuentes@cs.utas.edu.au)
`
`CCC 0038–0644/95/070811–19
`1995 by John Wiley & Sons, Ltd.
`
`Received April 1994
`Revised January 1995
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`812
`
`C. CIFUENTES AND K. J. GOUGH
`
`and runtime support routines whenever required. These routines are normally written in
`assembler and in most cases are untranslatable into a higher-level representation. Also,
`most operating systems do not provide a mechanism to share libraries; consequently, binary
`programs are self-contained and library routines are bound into each binary image. Library
`routines are either written in the language the compiler was written in, or in assembler. This
`means that a binary program contains not only the routines written by the programmer, but
`a great number of other routines linked in by the linker. As an example of the amount of
`extra subroutines available in a binary program, a ‘hello world’ program compiled in C
`generates 23 different procedures. The same program compiled in Pascal generates more
`than 40 procedures. All we are really interested in is the one procedure, (cid:109)(cid:97)(cid:105)(cid:110).
`Despite the above-mentioned limitations, there are several uses for a decompiler, including
`two major software areas: maintenance of code and software security. From a maintenance
`point of view, a decompiler is an aid in the recovery of lost source code, the migration
`of applications to a new hardware platform, the translation of code written in an obsolete
`language into a newer language, the structuring of old code written in an unstructured way
`(e.g. ‘spaghetti’ code), and a debugger tool that helps in ﬁnding and correcting bugs in an
`existing binary program. From a security point of view, a binary program can be checked
`for the existence of malicious code (e.g. viruses) before it is run for the ﬁrst time on a
`computer, in safety-critical systems where the compiler is not trusted, the binary program
`is validated to do exactly what the original high-level language program intended to do,
`and thus, the output of the compiler can be veriﬁed in this way.
`Different attempts at writing decompilers have been made in the last 20 years. Due to the
`amount of information lost in the compilation process, to be able to regenerate high-level
`language (HLL) code, all of these experimental decompilers have limitations in one way
`or another, including decompilation of assembly ﬁles1,2,3,4,5 or object ﬁles with or without
`symbolic debugging information,6,7 simpliﬁed high-level language,1 and the requirement
`of the compiler’s speciﬁcation.8,9 Assembly programs have helpful data information in the
`form of symbolic text, such as data segments, data and type declarations, subroutine names,
`subroutine entry point, and subroutine exit statement. All this information can be collected in
`a symbol table and the decompiler would not need to address the problem of separating data
`from instructions, or the naming of variables and subroutines. Object ﬁles with debugging
`information contain the program’s symbol table as constructed by the compiler. Given the
`symbol table, it is easy to determine which memory locations are instructions, as there is a
`certainty on which memory locations represent data. In general, object ﬁles contain more
`information than binary ﬁles. Finally, the requirement to have access to the compiler’s
`speciﬁcations is impractical, as these speciﬁcations are not normally disclosed by compiler
`manufacturers, or do not even exist.
`Our decompiler, dcc, differs from previous decompilation projects in several ways; it
`(cid:3)
`analysis to
`analyses binary programs rather than assembler or object ﬁles, performs idiom
`capture the essence of a sequence of instructions with a special meaning, performs data ﬂow
`analysis on registers and condition codes to eliminate them, and structures the program’s
`control ﬂow graph into a generic set of high-level structures that can be accommodated
`into different high-level languages, eliminating as much as possible the use of the (cid:103)(cid:111)(cid:116)(cid:111)
`statement.
`The rest of this paper is structured in the following way: a thorough description of
`the structure of a decompiler, followed by the description of our implementation of an
`∗
`
`An idiom is a sequence of instruction that forms a logical entity and has a meaning that cannot be derived by considering
`the primary meanings of the individual instructions
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`DECOMPILATION OF BINARY PROGRAMS
`
`813
`
`binary program
`
`?
`
`Front-end
`
`(machine dependent)
`
`?
`
`UDM
`
`(analysis)
`
`?
`
`Back-end
`
`(language dependent)
`
`?
`
`HLL program
`
`Figure 1. Decompiler modules
`
`automatic decompiling system, and conclusions. The paper is followed by the deﬁnitions
`of graph theoretical concepts used throughout the paper (Appendix I), and sample output
`from different phases of the decompilation of a program (Appendix II).
`
`THE DECOMPILER STRUCTURE
`A decompiler can be structured in a similar way to a compiler, that is, a series of modules
`that deal with machine- or language-dependent features. Three main modules are required: a
`machine-dependent module that reads in the program, loads it into virtual memory and parses
`it (the front-end), a machine- and language-independent module that analyses the program
`in memory (the universal decompiling machine), and a language-dependent module that
`writes formatted output for the target language (the back-end) (see Figure 1). This modular
`representation makes it easier to write decompilers for different machine/target language
`pairs, by writing different front-ends for different machines, and different back-ends for
`different target languages. This result is true in theory, but in practical applications is
`always limited by the generality of the intermediate language used.
`
`The front-end
`The front-end module deals with machine-dependent features and produces a machine-
`independent representation. It takes as input a binary program for a speciﬁc machine, loads
`it into virtual memory, parses it, and produces an intermediate representation of the program,
`as well as the program’s control ﬂow graph (see Figure 2).
`The parser disassembles code starting at the entry point given by the loader, and follows
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`814
`
`C. CIFUENTES AND K. J. GOUGH
`
`binary program
`
`?
`
`Loader
`
`?
`
`Parser
`
`?
`
`Semantic analysis
`
`?
`
`low-level intermediate code
`
`control ﬂow graph
`
`Figure 2. Front-end phases
`
`the instructions sequentially until a change in ﬂow of control is met. Flow of control is
`changed due to a conditional, unconditional or indexed branch, or a procedure call; in
`which case the target branch label(s) start new instruction paths to be disassembled. A
`path is ﬁnished by a return instruction or program termination. All instruction paths are
`followed in a recursive manner. Problems are introduced by machine instructions that use
`indexed or indirect addressing modes. To handle these, heuristic methods are implemented.
`For example, while disassembling code, the parser must check for sequences of instructions
`that represent a multiway branch (e.g. a (cid:115)(cid:119)(cid:105)(cid:116)(cid:99)(cid:104) statement), normally implemented as an
`index jump in a jump table.10,11 Finally, the intermediate code is generated and the control
`ﬂow graph is built.
`Two levels of intermediate code are required; a low-level representation that resembles
`the assembler from the machine, and a higher-level representation that resembles statements
`from a high-level language. The initial level, or low-level intermediate code, is a simple m :
`1 mapping of machine instructions to assembler mnemonics. Compound instructions (such
`as (cid:114)(cid:101)(cid:112) (cid:109)(cid:111)(cid:118)(cid:115)) are represented by a unique low-level intermediate intruction (e.g. (cid:114)(cid:101)(cid:112)(cid:95)(cid:109)(cid:111)(cid:118)(cid:115)).
`The second level, or high-level intermediate code, is generated by the interprocedural data
`ﬂow analysis, explained later, and maps n : 1 low-level to high-level instructions. The
`front-end generates a low-level representation.
`The semantic analysis phase performs idiom analysis and type propagation. Idioms are
`replaced by an appropriate functionally equivalent intermediate instruction. For example,
`Figure 3 illustrates two different idioms: the one on the left-hand side is a negation of a
`long variable, represented in this case by registers (cid:100)(cid:120)(cid:58)(cid:97)(cid:120). The idiom on the right-hand side
`is the prologue code of a high-level language procedure. In this case, space for 6 bytes
`is being reserved on the stack for local variables. There are a number of different idioms
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`DECOMPILATION OF BINARY PROGRAMS
`
`815
`
`(cid:110)(cid:101)(cid:103) (cid:100)(cid:120)
`(cid:110)(cid:101)(cid:103) (cid:97)(cid:120)
`(cid:115)(cid:98)(cid:98) (cid:100)(cid:120)(cid:44) (cid:48)
`
`(cid:112)(cid:117)(cid:115)(cid:104) (cid:98)(cid:112)
`(cid:109)(cid:111)(cid:118) (cid:98)(cid:112)(cid:44) (cid:115)(cid:112)
`(cid:115)(cid:117)(cid:98) (cid:98)(cid:112)(cid:44) (cid:54)
`
`⇓
`
`(cid:110)(cid:101)(cid:103) (cid:100)(cid:120)(cid:58)(cid:97)(cid:120)
`
`(cid:101)(cid:110)(cid:116)(cid:101)(cid:114) (cid:54)(cid:44)(cid:48)
`
`Figure 3. Sample idioms and their transformation
`
`widely known in the compiler community, and the decompiler must code them in order to
`generate clearer high-level code.
`Type information is propagated after the idioms have been recognized. For example, if a
`long local variable is found at stack offsets -1 to -4, all references to (cid:91)(cid:98)(cid:112)(cid:45)(cid:50)(cid:93) and (cid:91)(cid:98)(cid:112)(cid:45)(cid:52)(cid:93)
`must be merged into references to such a long variable, e.g. (cid:91)(cid:98)(cid:112)(cid:45)(cid:50)(cid:93)(cid:58)(cid:91)(cid:98)(cid:112)(cid:45)(cid:52)(cid:93). Other type
`information can be propagated in the same way, such as ﬁelds (offsets) of a record.
`An optimization phase is performed on the control ﬂow graph as well. Due to the nature
`of machine code instructions, the compiler might need to introduce intermediate branches
`in an executable program, because there is no machine instruction capable of branching
`more than a certain maximum distance in bytes (architecture dependent). An optimization
`pass over the control ﬂow graph removes this redundancy, by replacing the target branch
`location of all conditional or unconditional jumps that branch to an unconditional jump (and
`any recursive branches in this format) with the ﬁnal target basic block. While performing
`this process, some basic blocks are not going to be referenced any more, as they were used
`only for intermediate branches. These nodes must be eliminated from the graph as well.
`
`The universal decompiling machine
`The universal decompiling machine (UDM) is an intermediate module that is totally ma-
`chine and language independent. It deals with ﬂow graphs and the intermediate representa-
`tion of the program and performs all the ﬂow analysis the input program needs (see Figure 4).
`
`Data ﬂow analysis
`The aim of the data ﬂow analysis phase is to transform the low-level intermediate repre-
`sentation into a higher-level representation that resembles a HLL statement. It is therefore
`necessary to eliminate the concept of condition codes (or ﬂags) and registers, as these con-
`cepts do not exist in high-level languages, and to introduce the concept of expressions,
`as these can be used in any HLL program. For this purpose, the technology of compiler
`optimization has been appropriated.
`The ﬁrst analysis is concerned with condition codes. Some condition codes are used
`only by hand-crafted assembly code instructions, and thus are not translatable to a high-
`level representation. Therefore, condition codes are classiﬁed in two groups: HLCC which
`is the set of condition codes that are likely to have been generated by a compiler (e.g.
`overﬂow, carry), and NHLCC which is the set of condition codes that are likely to have
`been generated by assembly code (e.g. trap, interrupt). The HLCC set is the one that can
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`816
`
`C. CIFUENTES AND K. J. GOUGH
`
`control ﬂow graph
`
`low-level intermediate code
`
`?
`
`Data ﬂow Analysis
`
`?
`
`Control ﬂow Analysis
`
`?
`
`high-level intermediate code
`
`structured control ﬂow graph
`
`Figure 4. UDM phases
`
`be analysed further. Instructions that use condition codes in the NHLCC set mean that the
`subroutine is most likely non high-level, and is therefore ﬂagged as being so; assembler is
`all that can be generated for these subroutines.
`A use/deﬁnition, or reaching deﬁnition, analysis is performed on condition codes. In
`this way, for a given use of a condition code c at instruction j, the use/deﬁnition chain
`((cid:117)(cid:100)(cid:45)(cid:99)(cid:99)(cid:40)(cid:41)) determines which instruction(s) deﬁned this condition. In practical cases, there
`is always only one instruction that deﬁned the condition(s) used in the instruction at j; (i.e.
`(cid:117)(cid:100)(cid:45)(cid:99)(cid:99)(x) = {i}). This set can be computed by a forward-ﬂow any-path algorithm.12,13 Once
`the set of deﬁned/used instructions is known (i.e. (i, j)), these low-level instructions can
`be replaced by a unique conditional high-level instruction that is semantically equivalent to
`the given instructions. Let us consider the example in Figure 5, which illustrates this point.
`Instruction 2 uses the sign ((cid:83)(cid:70)) and zero ((cid:90)(cid:70)(cid:41) condition codes. These ﬂags were previously
`deﬁned by instruction 1, which also deﬁnes the carry ﬂag ((cid:67)(cid:70)). Given that this instruction
`deﬁnes all ﬂags used by the conditional jump, they can be merged into one high-level
`instruction that compares the registers for the greater condition; i.e. (cid:74)(cid:67)(cid:79)(cid:78)(cid:68) (cid:40)(cid:97)(cid:120) (cid:62) (cid:98)(cid:120)(cid:41).
`The second analysis is concerned with registers. The elimination of temporary registers
`leads to the elimination of intermediate instructions by replacing several low-level instruc-
`tions by one high-level instruction that does not make use of the intermediate register. As
`with condition codes, some machine instructions are hand-crafted by assembler program-
`mers, and are untranslatable to a higher representation. We therefore classify the low-level
`instructions into two sets: HLI which is the set of instructions that are representable in a
`high-level language (e.g. (cid:109)(cid:111)(cid:118), (cid:97)(cid:100)(cid:100)), and NHLI which is the set of instructions that are likely
`to be generated only by assembly code (e.g. (cid:99)(cid:108)(cid:105), (cid:105)(cid:110)(cid:115)). This analysis is concerned only
`with instructions from the HLI set. Instructions from the NHLI set belong to subroutines
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`DECOMPILATION OF BINARY PROGRAMS
`
`817
`
`(cid:49) (cid:99)(cid:109)(cid:112) (cid:97)(cid:120)(cid:44) (cid:98)(cid:120) (cid:59) (cid:100)(cid:101)(cid:102)(cid:58) (cid:83)(cid:70)(cid:44)(cid:90)(cid:70)(cid:44)(cid:67)(cid:70)
`(cid:50) (cid:106)(cid:103) (cid:108)(cid:97)(cid:98)(cid:90)
`(cid:59) (cid:117)(cid:115)(cid:101)(cid:58) (cid:83)(cid:70)(cid:44)(cid:90)(cid:70) (cid:59) (cid:117)(cid:100)(cid:45)(cid:99)(cid:99)(cid:40)(cid:83)(cid:70)(cid:44)(cid:90)(cid:70)(cid:41) (cid:61) (cid:123)(cid:49)(cid:125)
`⇓
`
`(cid:74)(cid:67)(cid:79)(cid:78)(cid:68) (cid:40)(cid:97)(cid:120) (cid:62) (cid:98)(cid:120)(cid:41)
`
`Figure 5. Condition code example
`
`that are likely to have been written in assembler, or are untranslatable, and therefore are
`ﬂagged as being so, and assembler is generated for them.
`Two preliminary analyses are required for the elimination of registers. A deﬁnition/use
`analysis on registers is needed to determine how many uses there are for a deﬁnition of a
`register. Note that register variables are not eliminated by this analysis, as they represent
`local variables and thus are required in the ﬁnal output program. This analysis can be solved
`in a backward-ﬂow any-path problem.12,13 Also, an interprocedural live register analysis is
`required, to ascertain which registers are live on entrance and on exit from a basic block.
`This analysis is also solved by known algorithms.12,13
`The method to eliminate registers has been named forward substitution as, by performing
`a forward substitution of the symbolic contents of a deﬁned register that is only used once
`on the instruction that uses it, the temporary register is eliminated, the temporary instruction
`that deﬁned the register is also eliminated, and the ﬁnal instruction that used the register
`is now deﬁned in terms of an expression. Let us consider a modulo 10 example, described
`in Figure 6. The registers (cid:115)(cid:105) and (cid:100)(cid:105) are register variables in this example. Register (cid:116)(cid:109)(cid:112) is
`a virtual register introduced by the parser whenever a (cid:68)(cid:73)(cid:86) instruction is found, as (cid:68)(cid:73)(cid:86) is
`equivalent to two low-level instructions; a division and a modulus. Because they use the
`same registers as operands, and they redeﬁne these registers, a (cid:116)(cid:109)(cid:112) register is introduced
`
`(cid:59) (cid:111)(cid:116)(cid:104)(cid:101)(cid:114) (cid:99)(cid:111)(cid:100)(cid:101) (cid:104)(cid:101)(cid:114)(cid:101)
`(cid:46)(cid:46) (cid:46)(cid:46)(cid:46)
`(cid:59) (cid:100)(cid:117)(cid:40)(cid:97)(cid:120)(cid:41) (cid:61) (cid:123)(cid:51)(cid:48)(cid:125)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:97)(cid:120)(cid:44) (cid:100)(cid:105)
`(cid:50)(cid:56) (cid:77)(cid:79)(cid:86) (cid:97)(cid:120)(cid:44) (cid:100)(cid:105)
`(cid:59) (cid:100)(cid:117)(cid:40)(cid:98)(cid:120)(cid:41) (cid:61) (cid:123)(cid:51)(cid:50)(cid:44)(cid:51)(cid:51)(cid:125)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:98)(cid:120)(cid:44) (cid:48)(cid:65)(cid:104)
`(cid:50)(cid:57) (cid:77)(cid:79)(cid:86) (cid:98)(cid:120)(cid:44) (cid:48)(cid:65)(cid:104)
`(cid:59) (cid:100)(cid:117)(cid:40)(cid:97)(cid:120)(cid:41) (cid:61) (cid:123)(cid:51)(cid:49)(cid:125)(cid:44) (cid:100)(cid:117)(cid:40)(cid:100)(cid:120)(cid:41) (cid:61) (cid:123)(cid:51)(cid:49)(cid:125)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:100)(cid:120)(cid:58)(cid:97)(cid:120)(cid:44) (cid:97)(cid:120)
`(cid:51)(cid:48) (cid:67)(cid:87)(cid:68)
`(cid:59) (cid:100)(cid:117)(cid:40)(cid:116)(cid:109)(cid:112)(cid:41) (cid:61) (cid:123)(cid:51)(cid:50)(cid:44)(cid:51)(cid:51)(cid:125)
`(cid:51)(cid:49) (cid:77)(cid:79)(cid:86) (cid:116)(cid:109)(cid:112)(cid:44) (cid:100)(cid:120)(cid:58)(cid:97)(cid:120) (cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:116)(cid:109)(cid:112)(cid:44) (cid:100)(cid:120)(cid:58)(cid:97)(cid:120)
`(cid:51)(cid:50) (cid:68)(cid:73)(cid:86) (cid:98)(cid:120)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:97)(cid:120)(cid:44) (cid:116)(cid:109)(cid:112) (cid:47) (cid:98)(cid:120) (cid:59) (cid:100)(cid:117)(cid:40)(cid:97)(cid:120)(cid:41) (cid:61) (cid:123)(cid:125)
`(cid:51)(cid:51) (cid:77)(cid:79)(cid:68) (cid:98)(cid:120)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:100)(cid:120)(cid:44) (cid:116)(cid:109)(cid:112) (cid:37) (cid:98)(cid:120) (cid:59) (cid:100)(cid:117)(cid:40)(cid:100)(cid:120)(cid:41) (cid:61) (cid:123)(cid:51)(cid:52)(cid:125)
`(cid:51)(cid:52) (cid:77)(cid:79)(cid:86) (cid:115)(cid:105)(cid:44) (cid:100)(cid:120)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:115)(cid:105)(cid:44) (cid:100)(cid:120)
`(cid:46)(cid:46) (cid:46)(cid:46)(cid:46)
`(cid:59) (cid:111)(cid:116)(cid:104)(cid:101)(cid:114) (cid:99)(cid:111)(cid:100)(cid:101) (cid:104)(cid:101)(cid:114)(cid:101)(cid:44) (cid:110)(cid:111) (cid:117)(cid:115)(cid:101) (cid:111)(cid:102) (cid:97)(cid:120)
`⇓
`
`(cid:65)(cid:83)(cid:71)(cid:78) (cid:115)(cid:105)(cid:44) (cid:100)(cid:105) (cid:37) (cid:48)(cid:65)(cid:104)
`
`Figure 6. Simple expression example
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`818
`
`C. CIFUENTES AND K. J. GOUGH
`
`to hold the initial value of (cid:100)(cid:120)(cid:58)(cid:97)(cid:120). The substitution up to instruction 32 is illustrated in the
`following code:
`
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:97)(cid:120)(cid:44) (cid:100)(cid:105)
`(cid:50)(cid:56) (cid:77)(cid:79)(cid:86) (cid:97)(cid:120)(cid:44) (cid:100)(cid:105)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:98)(cid:120)(cid:44) (cid:48)(cid:65)(cid:104)
`(cid:50)(cid:57) (cid:77)(cid:79)(cid:86) (cid:98)(cid:120)(cid:44) (cid:48)(cid:65)(cid:104)
`(cid:40)(cid:115)(cid:117)(cid:98)(cid:115)(cid:116)(cid:105)(cid:116)(cid:117)(cid:116)(cid:101) (cid:50)(cid:56)(cid:45)(cid:62)(cid:51)(cid:48)(cid:41)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:100)(cid:120)(cid:58)(cid:97)(cid:120)(cid:44) (cid:100)(cid:105)
`(cid:51)(cid:48) (cid:67)(cid:87)(cid:68)
`(cid:40)(cid:115)(cid:117)(cid:98)(cid:115)(cid:116)(cid:105)(cid:116)(cid:117)(cid:116)(cid:101) (cid:51)(cid:48)(cid:45)(cid:62)(cid:51)(cid:49)(cid:41)
`(cid:51)(cid:49) (cid:77)(cid:79)(cid:86) (cid:116)(cid:109)(cid:112)(cid:44) (cid:100)(cid:120)(cid:58)(cid:97)(cid:120) (cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:116)(cid:109)(cid:112)(cid:44) (cid:100)(cid:105)
`(cid:51)(cid:50) (cid:68)(cid:73)(cid:86) (cid:98)(cid:120)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:97)(cid:120)(cid:44)(cid:116)(cid:109)(cid:112) (cid:47) (cid:98)(cid:120) (cid:40)(cid:101)(cid:108)(cid:105)(cid:109)(cid:105)(cid:110)(cid:97)(cid:116)(cid:101) (cid:105)(cid:110)(cid:115)(cid:116)(cid:114)(cid:117)(cid:99)(cid:116)(cid:105)(cid:111)(cid:110)(cid:41)
`
`Instruction 32 deﬁnes a register that is not going to be used; therefore this instruction
`is redundant and must be eliminated. Any uses of the registers in the right-hand side of
`the instruction that is redundant need to be backpatched to reﬂect the non-existence of the
`instruction. After this step, the code would look like this:
`
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:98)(cid:120)(cid:44) (cid:48)(cid:65)(cid:104)
`(cid:50)(cid:57) (cid:77)(cid:79)(cid:86) (cid:98)(cid:120)(cid:44) (cid:48)(cid:65)(cid:104)
`(cid:51)(cid:49) (cid:77)(cid:79)(cid:86) (cid:116)(cid:109)(cid:112)(cid:44) (cid:100)(cid:120)(cid:58)(cid:97)(cid:120) (cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:116)(cid:109)(cid:112)(cid:44) (cid:100)(cid:105)
`(cid:51)(cid:51) (cid:77)(cid:79)(cid:68) (cid:98)(cid:120)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:100)(cid:120)(cid:44) (cid:116)(cid:109)(cid:112) (cid:37) (cid:98)(cid:120)
`(cid:51)(cid:52) (cid:77)(cid:79)(cid:86) (cid:115)(cid:105)(cid:44) (cid:100)(cid:120)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:115)(cid:105)(cid:44) (cid:100)(cid:120)
`
`(cid:59) (cid:100)(cid:117)(cid:40)(cid:98)(cid:120)(cid:41) (cid:61) (cid:123)(cid:51)(cid:51)(cid:125)
`
`and the ﬁnal substitutions would give the ﬁnal result in instruction 34:
`
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:98)(cid:120)(cid:44) (cid:48)(cid:65)(cid:104)
`(cid:50)(cid:57) (cid:77)(cid:79)(cid:86) (cid:98)(cid:120)(cid:44)(cid:48)(cid:65)(cid:104)
`(cid:51)(cid:49) (cid:77)(cid:79)(cid:86) (cid:116)(cid:109)(cid:112)(cid:44)(cid:100)(cid:120)(cid:58)(cid:97)(cid:120) (cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:116)(cid:109)(cid:112)(cid:44) (cid:100)(cid:105)
`(cid:51)(cid:51) (cid:77)(cid:79)(cid:68) (cid:98)(cid:120)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:100)(cid:120)(cid:44) (cid:100)(cid:105) (cid:37) (cid:48)(cid:65)(cid:104) (cid:40)(cid:115)(cid:117)(cid:98)(cid:115)(cid:116)(cid:105)(cid:116)(cid:117)(cid:116)(cid:101) (cid:51)(cid:49)(cid:45)(cid:62)(cid:51)(cid:51)(cid:41)
`(cid:40)(cid:115)(cid:117)(cid:98)(cid:115)(cid:116)(cid:105)(cid:116)(cid:117)(cid:116)(cid:101) (cid:50)(cid:57)(cid:45)(cid:62)(cid:51)(cid:51)(cid:41)
`(cid:59) (cid:65)(cid:83)(cid:71)(cid:78) (cid:115)(cid:105)(cid:44) (cid:100)(cid:105) (cid:37) (cid:48)(cid:65)(cid:104) (cid:40)(cid:115)(cid:117)(cid:98)(cid:115)(cid:116)(cid:105)(cid:116)(cid:117)(cid:116)(cid:101) (cid:51)(cid:51)(cid:45)(cid:62)(cid:51)(cid:52)(cid:41)
`
`(cid:51)(cid:52) (cid:77)(cid:79)(cid:86) (cid:115)(cid:105)(cid:44) (cid:100)(cid:120)
`
`A complete algorithm that describes this data ﬂow analysis can be found in References 14
`and 15.
`
`Control ﬂow analysis
`The control ﬂow analyser structures the control ﬂow graph into generic high-level control
`structures that are available in most languages. These are conditional ((cid:105)(cid:102)(cid:46)(cid:46)(cid:116)(cid:104)(cid:101)(cid:110)(cid:91)(cid:46)(cid:46)(cid:101)(cid:108)(cid:115)(cid:101)(cid:93)),
`multiway branch ((cid:99)(cid:97)(cid:115)(cid:101)), and different types of loops ((cid:119)(cid:104)(cid:105)(cid:108)(cid:101)(cid:40)(cid:41), (cid:114)(cid:101)(cid:112)(cid:101)(cid:97)(cid:116)(cid:46)(cid:46)(cid:117)(cid:110)(cid:116)(cid:105)(cid:108)(cid:40)(cid:41), and
`endless (cid:108)(cid:111)(cid:111)(cid:112)). Different methods have been speciﬁed in the literature to structure graphs,
`most of them dealing with the elimination of (cid:103)(cid:111)(cid:116)(cid:111) statements from the graph, by the
`introduction of new variables16,17, code replication18,19,20 or the use of multilevel (cid:101)(cid:120)(cid:105)(cid:116)21,22.
`Both the introduction of new variables and code replication modify the apparent semantics
`of the program, and is therefore not desirable when decompiling binary programs, given
`that we want to decompile the code ‘as is’. The use of multilevel (cid:101)(cid:120)(cid:105)(cid:116) statements is
`not supported by commonly used languages (e.g. Pascal, C), and thus cannot be part of
`the generic set of high-level control constructs that can be generated. We developed an
`algorithm that structures the graph into the set of generic high-level control structures, and,
`whenever it determines that a particular subgraph is not one of the generic constructs, it
`uses a (cid:103)(cid:111)(cid:116)(cid:111). Note that the minimum number of (cid:103)(cid:111)(cid:116)(cid:111)s is always used. This algorithm is
`described in Reference 23.
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`DECOMPILATION OF BINARY PROGRAMS
`
`819
`
`.....
`jcond ((si * 5) == 50)
`
`
`
`
`
`
`
`
`.....
`jcond (((si * 5) != 50) or (di (cid:60) si))
`
`(cid:1)
`
`(cid:1)
`
`A
`
`A
`
`A
`
`A
`
`A
`
`A
`
`AAU
`
`other
`
`(cid:1)
`
`(cid:1)
`
`(cid:1)
`
`(cid:1)
`
`(cid:1)(cid:1)(cid:11)
`
`printf()
`?
`
`jcond (di (cid:60) si)
`Q
`
`Q
`
`(cid:27) %
`
`Q
`
`Q
`
`Q
`
`QQs
`other
`
`?
`
`printf()
`?
`
`Figure 7. Short-circuit evaluation graph
`
`A second structuring phase can be implemented to check for short circuit evaluation
`graphs. These graphs can be transformed into simpler graphs that hold two or more com-
`pound conditions on the one basic block, rather than requiring to generate high-level code
`that uses two or more nested (cid:105)(cid:102)(cid:46)(cid:46)(cid:116)(cid:104)(cid:101)(cid:110) statements. Figure 7 illustrates an example of a
`compound (cid:111)(cid:114) condition. The top basic block checks for the equality of (cid:40)(cid:115)(cid:105) (cid:42) (cid:53)(cid:41) and (cid:53)(cid:48).
`If this is false, a (cid:112)(cid:114)(cid:105)(cid:110)(cid:116)(cid:102)(cid:40)(cid:41) node is reached. On the other hand, if the equality is true, a
`second condition is checked; (cid:100)(cid:105) (cid:60) (cid:115)(cid:105). If this condition is true, the same (cid:112)(cid:114)(cid:105)(cid:110)(cid:116)(cid:102)(cid:40)(cid:41) basic
`block is reached, otherwise some other code is reached. Rather than generating C code for
`these conditions that require a (cid:103)(cid:111)(cid:116)(cid:111) (as these conditions are not properly nested), they can
`be merged into a compound (cid:111)(cid:114) that negates the ﬁrst condition, as the (cid:112)(cid:114)(cid:105)(cid:110)(cid:116)(cid:102)(cid:40)(cid:41) node is
`reached whenever the ﬁrst condition is false, and leaves the second condition as it is, given
`that this condition also reaches (cid:112)(cid:114)(cid:105)(cid:110)(cid:116)(cid:102)(cid:40)(cid:41) when it is true. The intermediate basic block
`and edges are removed from the graph. The complete structuring algorithm is described in
`References 15 and 24.
`
`The back-end
`The back-end module is language dependent, as it deals with the target high-level language.
`This module, optionally restructures the graph into control constructs available only in the
`particular target language, and then generates code for this language (see Figure 8).
`The restructuring phase is optional and aims at structuring the graph even further, so
`that control structures available in the target language but not present in the generic set
`of control structures of the structuring algorithm, previously described, are utilized. For
`instance, if the target language is Ada, multilevel exits are allowed. After the graph has
`been structured, multilevel exits look like a loop with abnormal (cid:103)(cid:111)(cid:116)(cid:111) exits. The restructuring
`phase can check the target destination of each (cid:103)(cid:111)(cid:116)(cid:111), and determine if an (cid:101)(cid:120)(cid:105)(cid:116)(cid:40)(cid:105)(cid:41) statement
`is suitable instead. Another example is the (cid:102)(cid:111)(cid:114) loop; such a loop is equivalent to a (cid:119)(cid:104)(cid:105)(cid:108)(cid:101)(cid:40)(cid:41)
`
`Blue Coat Systems - Exhibit 1022
`
`

`
`820
`
`C. CIFUENTES AND K. J. GOUGH
`
`structured control ﬂow graph
`
`high-level intermediate code
`
`?
`
`Restructuring
`
`?
`
`HLL Code Generation
`
`?
`
`HLL program
`
`Figure 8. Back-end phases
`
`loop that makes use of an induction variable. In this case, the induction variable needs to
`be found.
`The ﬁnal phase is the HLL code generation which generates code for the target HLL
`based on the control ﬂow graph an

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases