`
`Accurate Comparison of Binary Executables
`Accurate Comparison of Binary Executables
`
`Martial Bourquin Andy King Martial Bourquin Andy King
`
`
`University of Kent University of Kent
`
`bourquin.martial@gmail.com bourquin.martial@gmail.com
`
`a.m.king@kent.ac.uk a.m.king@kent.ac.uk
`
`
`
`Edward Robbins Edward Robbins
`
`
`
`er209@kent.ac.uk er209@kent.ac.uk
`
`Abstract
`Abstract
`
`As the volume of malware inexorably rises, comparison of binary As the volume of malware inexorably rises, comparison of binary
`
`code is of increasing importance to security analysts as a method of code is of increasing importance to security analysts as a method of
`
`automatically classifying new malware samples; purportedly new automatically classifying new malware samples; purportedly new
`examples of malware are frequently a simple evolution of exist-
`examples of malware are frequently a simple evolution of exist-
`ing code, whose differences stem only from a need to avoid de-
`ing code, whose differences stem only from a need to avoid de-
`tection. This paper presents a polynomial algorithm for calculat-
`tection. This paper presents a polynomial algorithm for calculat-
`
`ing the differences between two binaries, obtained by fusing the ing the differences between two binaries, obtained by fusing the
`
`well-known BinDiff algorithm with the Hungarian algorithm for well-known BinDiff algorithm with the Hungarian algorithm for
`
`bi-partite graph matching. This significantly improves the matching bi-partite graph matching. This significantly improves the matching
`accuracy. Additionally a meaningful metric of similarity is calcu-
`accuracy. Additionally a meaningful metric of similarity is calcu-
`lated, based on graph edit distance, from which an informed com-
`lated, based on graph edit distance, from which an informed com-
`parison of the binaries can be made. The accuracy of this method
`parison of the binaries can be made. The accuracy of this method
`
`over the standard approach is demonstrated. over the standard approach is demonstrated.
`
`
`1. Introduction 1. Introduction
`
`Automated auditing of binary code is a topic of increasing interest Automated auditing of binary code is a topic of increasing interest
`to both academia and industry. The pace of software development is
`to both academia and industry. The pace of software development is
`such that governments and other organisations must deal with soft-
`such that governments and other organisations must deal with soft-
`ware updates on an almost daily basis that must be vetted for vul-
`ware updates on an almost daily basis that must be vetted for vul-
`
`nerabilities. Meanwhile the anti-virus (AV) industry is attempting nerabilities. Meanwhile the anti-virus (AV) industry is attempting
`
`to cope with an almost constant proliferation of new malware, most to cope with an almost constant proliferation of new malware, most
`
`of which are simple evolutions of known code, and needs to iden-of which are simple evolutions of known code, and needs to iden-
`tify, classify and protect against each. For these tasks binary differ-
`tify, classify and protect against each. For these tasks binary differ-
`encing is of great necessity; for vulnerability discovery in software
`encing is of great necessity; for vulnerability discovery in software
`updates the security analyst wishes to examine only the new code
`updates the security analyst wishes to examine only the new code
`
`and so must identify what is old and what is new quickly, while in and so must identify what is old and what is new quickly, while in
`
`malware identification/classification the key questions are whether malware identification/classification the key questions are whether
`
`the code is a mutation of a known piece of malware, and if so, what the code is a mutation of a known piece of malware, and if so, what
`has changed that needs to be accounted for? In contrast to various
`has changed that needs to be accounted for? In contrast to various
`nefarious applications, such as reversing security patches, binary
`nefarious applications, such as reversing security patches, binary
`differencing has also found application in legal settings as a litmus
`differencing has also found application in legal settings as a litmus
`
`test for infringement against open source licensing and copyright test for infringement against open source licensing and copyright
`
`agreements. agreements.
`The most obvious cause of changes between different versions
`The most obvious cause of changes between different versions
`of the same application is simple addition and removal of code.
`of the same application is simple addition and removal of code.
`
`However there are at least two other sources of differences which However there are at least two other sources of differences which
`result in some binaries appearing different, while in fact being func-
`result in some binaries appearing different, while in fact being func-
`
`tionally identical. The first is so-called `server-side polymorphism', tionally identical. The first is so-called `server-side polymorphism',
`typically a result of using different compiler optimisations, a dif-
`typically a result of using different compiler optimisations, a dif-
`
`Permission to make digital or hard copies of all or part of this work for personal or
`Permission to make digital or hard copies of all or part of this work for personal or
`classroom use is granted without fee provided that copies are not made or distributed
`classroom use is granted without fee provided that copies are not made or distributed
`for profit or commercial advantage and that copies bear this notice and the full citation
`for profit or commercial advantage and that copies bear this notice and the full citation
`
`on the first page. To copy otherwise, to republish, to post on servers or to redistribute on the first page. To copy otherwise, to republish, to post on servers or to redistribute
`
`to lists, requires prior specific permission and/or a fee. to lists, requires prior specific permission and/or a fee.
`PPREW '13 Jan 26, 2013 Rome, Italy.
`PPREW '13 Jan 26, 2013 Rome, Italy.
`
`Copyright © 2013 ACM 978-1-4503-1857-0/13/01...$15.00 Copyright © 2013 ACM 978-1-4503-1857-0/13/01...$15.00
`
`ferent compiler version, or a different compiler all together. The
`ferent compiler version, or a different compiler all together. The
`second is wilful obfuscation in order to bypass detection by signa-
`second is wilful obfuscation in order to bypass detection by signa-
`ture engines of anti-virus software. Regardless of the source of the
`ture engines of anti-virus software. Regardless of the source of the
`changes, they render simple comparison techniques such as byte-
`changes, they render simple comparison techniques such as byte-
`for-byte file differencing almost completely useless. Some exam-
`for-byte file differencing almost completely useless. Some exam-
`ples of such changes are:
`ples of such changes are:
`
`• Sequences of opcodes may remain identical but register alloca-
`• Sequences of opcodes may remain identical but register alloca-
`
`tion can change depending on availability at a compile time, i.e. tion can change depending on availability at a compile time, i.e.
``ecx' may become `edx' and so on.
``ecx' may become `edx' and so on.
`• If instructions do not depend on other instructions they may
`• If instructions do not depend on other instructions they may
`
`be reordered due to pipeline optimisations without affecting be reordered due to pipeline optimisations without affecting
`operational semantics.
`operational semantics.
`
`• Junk/do-nothing code is inserted between instructions in order • Junk/do-nothing code is inserted between instructions in order
`to bypass detection by signature engines of anti-virus software.
`to bypass detection by signature engines of anti-virus software.
`• Obfuscating [7] or diversifying [2, 8] transforms are applied to
`• Obfuscating [7] or diversifying [2, 8] transforms are applied to
`
`a binary to replace sets of instructions by equivalent ones, thus a binary to replace sets of instructions by equivalent ones, thus
`preserving the action of the code but changing its representation
`preserving the action of the code but changing its representation
`
`at the instruction level. In the former case the new code aims to at the instruction level. In the former case the new code aims to
`
`be more cryptic; in the latter case it aims to be unique. be more cryptic; in the latter case it aims to be unique.
`
`
`This paper seeks to address these issues in binary comparison by This paper seeks to address these issues in binary comparison by
`
`bringing together two existing approaches to the problem. Both bringing together two existing approaches to the problem. Both
`
`techniques perform structural matching [16]. In the context of bi-techniques perform structural matching [16]. In the context of bi-
`
`naries this means that they seek to identify/recover key structural naries this means that they seek to identify/recover key structural
`components from each binary to compare and match, establishing
`components from each binary to compare and match, establishing
`an isomorphism between the two binaries.
`an isomorphism between the two binaries.
`The first technique was developed by Thomas Dullien (AKA
`The first technique was developed by Thomas Dullien (AKA
`
`Halvar Flake) and his colleagues in their tool BinDiff [15]. BinDiff Halvar Flake) and his colleagues in their tool BinDiff [15]. BinDiff
`
`attempts to reconstruct the Control Flow Graph (CFG) of each attempts to reconstruct the Control Flow Graph (CFG) of each
`
`binary, as it uses the functions and basic blocks as the units for binary, as it uses the functions and basic blocks as the units for
`
`comparison. comparison.
`Note that the Control Flow Graph can be understood as a graph
`Note that the Control Flow Graph can be understood as a graph
`of graphs; at the top level it consists of the Call Graph (CG), which
`of graphs; at the top level it consists of the Call Graph (CG), which
`links functions together via function calls and returns, while each
`links functions together via function calls and returns, while each
`
`function is itself a CFG linking basic blocks via simple branch-function is itself a CFG linking basic blocks via simple branch-
`es/jumps. To clearly disambiguate whether the term CFG refers to
`es/jumps. To clearly disambiguate whether the term CFG refers to
`the CFG as a graph of graphs of a whole program, or the CFG
`the CFG as a graph of graphs of a whole program, or the CFG
`
`of a single function, henceforth when used in this paper it can be of a single function, henceforth when used in this paper it can be
`assumed that it refers to the graph of a function alone, unless oth-
`assumed that it refers to the graph of a function alone, unless oth-
`erwise stated.
`erwise stated.
`
`The BinDiff algorithm compares functions and basic blocks The BinDiff algorithm compares functions and basic blocks
`
`based on graph-centric properties derived as identifiers for them, based on graph-centric properties derived as identifiers for them,
`such as number of out-going edges, in-coming edges etc. These
`such as number of out-going edges, in-coming edges etc. These
`
`properties make up a tuple for each function or basic block, but it is properties make up a tuple for each function or basic block, but it is
`
`important to note that the tuples are not necessarily unique. BinDiff important to note that the tuples are not necessarily unique. BinDiff
`first creates an initial set of matches consisting only of uniquely
`first creates an initial set of matches consisting only of uniquely
`
`identical functions from each binary. This set is then expanded identical functions from each binary. This set is then expanded
`upon by taking each matched pair and searching for more unique
`upon by taking each matched pair and searching for more unique
`
`matches, but only amongst their unmatched neighbours, thus limit-matches, but only amongst their unmatched neighbours, thus limit-
`
`ing the search space and increasing the likelihood of finding unique ing the search space and increasing the likelihood of finding unique
`matches. This is then repeated with the new matches exhaustively
`matches. This is then repeated with the new matches exhaustively
`
`WIZ, Inc. EXHIBIT - 1047
`WIZ, Inc. v. Orca Security LTD.
`
`Pg. 1
`
`WIZ, Inc. EXHIBIT - 1047
`WIZ, Inc. v. Orca Security LTD.
`
`Pg. 1
`
`
`
`Algorithm 1: Initial match discovery
`Algorithm 1: Initial match discovery
`
`1 function initialMatches(SA, SB); 1 function initialMatches(SA, SB);
`
`2 M 2 M
`
`0; 0;
`
`3 foreach vertex a% E SA do 3 foreach vertex a% E SA do
`
`4 4
`
`foreach Selector E do foreach Selector E do
`5
`5
`if (a,, bj) E(a,,SB) then
`if (a,, bj) E(a,,SB) then
`6
`6
`
`4— .A4 U {az 4— .A4 U {az
`M
`M
`b3};
`b3};
`
`7 7
`
`SAVatl; SAVatl;
`
`8 8
`
`Sg\fbil; Sg\fbil;
`
`SB SB
`9
`9
`break;
`break;
`
`
`
`SA SA
`
`10 return (M, SA, SB);
`10 return (M, SA, SB);
`
`
`until no new unique matches can be found. The treatment of chil-until no new unique matches can be found. The treatment of chil-
`
`dren and parent nodes are discussed subsequently. The BinDiff al-dren and parent nodes are discussed subsequently. The BinDiff al-
`
`gorithm can be applied to match blocks in an analogous way to gorithm can be applied to match blocks in an analogous way to
`matching functions.
`matching functions.
`The second technique is also graph-centric [14], and is based
`The second technique is also graph-centric [14], and is based
`
`on a thread of work in Graph-Edit-Distance (GED) computation on a thread of work in Graph-Edit-Distance (GED) computation
`
`[17, 23]. GED, from graphing theory, is the minimal number of edit [17, 23]. GED, from graphing theory, is the minimal number of edit
`
`operations required to transform a graph gA into a graph gB [30]. operations required to transform a graph gA into a graph gB [30].
`A series of edit operations is called an edit path. With a directed
`A series of edit operations is called an edit path. With a directed
`labelled graph, an edit operation is either vertex substitution, vertex
`labelled graph, an edit operation is either vertex substitution, vertex
`insertion or vertex deletion. A vertex substitution occurs when a
`insertion or vertex deletion. A vertex substitution occurs when a
`
`vertex in graph gA is substituted for one from graph cB. Edge vertex in graph gA is substituted for one from graph cB. Edge
`
`substitutions are defined analogously. Vertex insertion occurs when substitutions are defined analogously. Vertex insertion occurs when
`
`a vertex from graph cB is added to graph gA. Edge insertion a vertex from graph cB is added to graph gA. Edge insertion
`is defined in a corresponding manner. Vertex deletion and edge
`is defined in a corresponding manner. Vertex deletion and edge
`deletion are defined similarly.
`deletion are defined similarly.
`There are multiple edit paths from one graph to another, as one
`There are multiple edit paths from one graph to another, as one
`
`would expect, but the desired path is the one with the lowest total would expect, but the desired path is the one with the lowest total
`
`cost of edit operations. GED also provides a convenient and mean-cost of edit operations. GED also provides a convenient and mean-
`
`ingful metric for measuring graph, and hence binary, similarity. Un-ingful metric for measuring graph, and hence binary, similarity. Un-
`
`fortunately computation of the GED is an NP-hard problem [33]. fortunately computation of the GED is an NP-hard problem [33].
`
`However, it has been shown [26, 30] that an approximation can However, it has been shown [26, 30] that an approximation can
`be obtained by transforming the problem into a weighted bipartite
`be obtained by transforming the problem into a weighted bipartite
`graph matching problem, which can be solved in cubic time with
`graph matching problem, which can be solved in cubic time with
`the Hungarian algorithm [24]. Furthermore, Hu et Al. have shown
`the Hungarian algorithm [24]. Furthermore, Hu et Al. have shown
`
`that performing automatic analysis and classification of malware that performing automatic analysis and classification of malware
`
`samples into families using an approximation of the GED is indeed samples into families using an approximation of the GED is indeed
`
`possible [17]. possible [17].
`
`BinDiff is fast and moreover the matches it makes are usually BinDiff is fast and moreover the matches it makes are usually
`accurate. The problem is that the matches it derives are often
`accurate. The problem is that the matches it derives are often
`incomplete: It frequently fails to match a pair of functions that
`incomplete: It frequently fails to match a pair of functions that
`are conspicuously similar. This issue stems from the fact that a
`are conspicuously similar. This issue stems from the fact that a
`
`function can exist in one binary whose tuple does not uniquely function can exist in one binary whose tuple does not uniquely
`match the tuple of a function in the other. The real problem is
`match the tuple of a function in the other. The real problem is
`that basing the matching of functions purely on tuples is brittle,
`that basing the matching of functions purely on tuples is brittle,
`
`as they abstract the structural features of a function. This is not a as they abstract the structural features of a function. This is not a
`problem if the desire is to only match functions which are identical
`problem if the desire is to only match functions which are identical
`as deemed by the abstraction, rather than those that have undergone
`as deemed by the abstraction, rather than those that have undergone
`
`modification and are similar but not identical. It is arguable whether modification and are similar but not identical. It is arguable whether
`
`this is an issue when two related executables are scrutinised so this is an issue when two related executables are scrutinised so
`as to diagnose the effect of a patch. However, for the problem of
`as to diagnose the effect of a patch. However, for the problem of
`
`malware classification, namely quantifying the degree of similarity malware classification, namely quantifying the degree of similarity
`
`between two arbitrary binaries, it is important to also match similar between two arbitrary binaries, it is important to also match similar
`functions, a concept which is crystalised with the notion of shortest
`functions, a concept which is crystalised with the notion of shortest
`
`edit path. BinDiff does provide a metric for the similarity of the edit path. BinDiff does provide a metric for the similarity of the
`graphs, but it is not particularly meaningful as the developers of
`graphs, but it is not particularly meaningful as the developers of
`
`BinDiff themselves concede [22]. On the other hand GED is one of, BinDiff themselves concede [22]. On the other hand GED is one of,
`
`perhaps the, most natural graph metric [11], and its solution always perhaps the, most natural graph metric [11], and its solution always
`provides a suggested match for remaining unmatched nodes.
`provides a suggested match for remaining unmatched nodes.
`
`Algorithm 2: Match propagation
`Algorithm 2: Match propagation
`
`1 function propagateMatches(./Vi, SA, SB); 1 function propagateMatches(./Vi, SA, SB);
`2 foreach {ai
`
`bj} E M do bj} E M do
`2 foreach {ai
`3
`7F
`3
`7F
`foreach
`Property
`do
`foreach
`Property
`do
`4
`4
`
`7r(ai, 7r(ai,
`
`SA SA
`
`SA); SA);
`s
`s
`
`ir(bj,SB); ir(bj,SB);
`6
`6
`0 then
`if S'A
`
`0
`0 then
`
`if S'A
`0
`A S iB
`A S iB
`
`7 7
`E S'A do
`foreach
`vertex
`E S'A do
`foreach
`vertex
`8
`
`foreach Selector E do
`8
`
`foreach Selector E do
`9
`
`(aii, cij) 4-- e(aii , SB ) then (aii, cij) 4-- e(aii , SB ) then
`9
`if
`if
`
`to to
`
`.A4 U {di .A4 U {di
`
`bij}; bij};
`11
`
`SA S'A \{a1}; SA S'A \{a1};
`11
`12
`S /B\{b /j};
`12
`S /B\{b /j};
`SB
`SB
`13
`SA SAVaa;
`13
`SA SAVaa;
`
`14 14
`
`SB\{b'j}; SB\{b'j};
`
`SB SB
`15
`15
`break;
`break;
`
`16 return (M, SA, SB);
`16 return (M, SA, SB);
`
`
`The work presented here performs automated static compari-The work presented here performs automated static compari-
`
`son of binary executable files by combining the two approaches son of binary executable files by combining the two approaches
`outlined. The BinDiff algorithm for matching both functions and
`outlined. The BinDiff algorithm for matching both functions and
`basic blocks between two binaries is improved by augmentation
`basic blocks between two binaries is improved by augmentation
`with the Hungarian algorithm, which finds matches with the min-
`with the Hungarian algorithm, which finds matches with the min-
`
`imum possible GED over all functions and basic blocks. The net imum possible GED over all functions and basic blocks. The net
`
`effect of overlaying the Hungarian algorithm on top of the Bin-effect of overlaying the Hungarian algorithm on top of the Bin-
`
`Diff algorithm is a more robust matching procedure that mops up Diff algorithm is a more robust matching procedure that mops up
`and matches functions that would otherwise remain unmatched. We
`and matches functions that would otherwise remain unmatched. We
`consider this to be the main contribution of the work. Moreover, the
`consider this to be the main contribution of the work. Moreover, the
`mop up phase is cubic, which is pleasing considering the computa-
`mop up phase is cubic, which is pleasing considering the computa-
`
`tional complexity of GED. tional complexity of GED.
`
`GED is also adapted to provide an accurate metric for the GED is also adapted to provide an accurate metric for the
`
`edit operations required to transform one binary into another. The edit operations required to transform one binary into another. The
`
`system is implemented in a novel way that provides an accurate system is implemented in a novel way that provides an accurate
`
`final edit path to the analyst at the end of the diffing process, final edit path to the analyst at the end of the diffing process,
`and quantitative accuracy results are presented for the number
`and quantitative accuracy results are presented for the number
`of functions matched that conclusively demonstrate the value of
`of functions matched that conclusively demonstrate the value of
`this method. As far as we know, our work also represents a step
`this method. As far as we know, our work also represents a step
`
`change in systematic evaluation; previously accuracy testing has change in systematic evaluation; previously accuracy testing has
`
`been carried out on a few examples, at best. been carried out on a few examples, at best.
`
`2. BinDiff 2. BinDiff
`
`As briefly explained in section 1, BinDiff [16] associates each basic
`As briefly explained in section 1, BinDiff [16] associates each basic
`block/function with a tuple that describes some of its properties.
`block/function with a tuple that describes some of its properties.
`
`For example, the tuple for a function is (a, 13, 7) where: For example, the tuple for a function is (a, 13, 7) where:
`
`• a is the number of basic blocks in the CFG of the function.
`• a is the number of basic blocks in the CFG of the function.
`
`• /3 is the number of edges in the CFG of the function.
`• /3 is the number of edges in the CFG of the function.
`
`• 7 is the number of function calls in the CFG of the function.
`• 7 is the number of function calls in the CFG of the function.
`
`
`In fact, in the later paper by Dullien [9], the concept of matching In fact, in the later paper by Dullien [9], the concept of matching
`is generalised using the idea of a selector. A selector is a function
`is generalised using the idea of a selector. A selector is a function
`
`that takes a vertex a, from a graph GA, and a set SB of vertices that takes a vertex a, from a graph GA, and a set SB of vertices
`
`from another graph GB and returns a vertex from SB that uniquely from another graph GB and returns a vertex from SB that uniquely
`matches a, if precisely one exists. Thus matching based on the tuple
`matches a, if precisely one exists. Thus matching based on the tuple
`
`defined above can be thought of as a selector that simply compares defined above can be thought of as a selector that simply compares
`the elements of the tuples for equality and uniqueness. Another
`the elements of the tuples for equality and uniqueness. Another
`
`example of a selector is a checksum selector based on a cyclic example of a selector is a checksum selector based on a cyclic
`
`redundancy check (CRC32) of the machine code that constitutes a redundancy check (CRC32) of the machine code that constitutes a
`block. For unstripped binaries the symbolic names of the functions
`block. For unstripped binaries the symbolic names of the functions
`
`Pg. 2
`
`Pg. 2
`
`
`
`Algorithm 3: BinDiff
`Algorithm 3: BinDiff
`1 function binDiff(cA, cs);
`1 function binDiff(cA, cs);
`
`2 SAS g A; 2 SAS g A;
`
`3 SB 3 SB
`gg;
`gg;
`
`4 A4/ 4 A4/
`0;
`0;
`5 (M, SA, SB)
`5 (M, SA, SB)
`
`initialMatches(SA, SB); initialMatches(SA, SB);
`6 while M' M do
`6 while M' M do
`7
`7
`
`M; M;
`
`8 8
`(M, SA, SB)
`(M, SA, SB)
`
`propagateMatches(M, SA, SB); propagateMatches(M, SA, SB);
`9 return (M, SA, SB);
`9 return (M, SA, SB);
`
`provide a natural selector [9], however this selector should be
`provide a natural selector [9], however this selector should be
`applied with caution as very different functions can share the same
`applied with caution as very different functions can share the same
`name, for instance main.
`name, for instance main.
`The BinDiff algorithm finds an initial set of matches by com-
`The BinDiff algorithm finds an initial set of matches by com-
`paring (using selectors) all vertices SA in the first graph with all
`paring (using selectors) all vertices SA in the first graph with all
`
`the vertices SB in the second graph. In Dullien's first publications the vertices SB in the second graph. In Dullien's first publications
`
`[14, 15] these initial matches were then expanded upon by taking all [14, 15] these initial matches were then expanded upon by taking all
`
`neighbours of a pair of matched vertices, and searching for matches neighbours of a pair of matched vertices, and searching for matches
`
`amongst those. This limited the vertices that were searched, and in amongst those. This limited the vertices that were searched, and in
`so doing increased the likelihood of finding new matches. However,
`so doing increased the likelihood of finding new matches. However,
`Dullien later [9] generalises neighbours to more abstract concepts
`Dullien later [9] generalises neighbours to more abstract concepts
`that he refers to as properties. Consider a function that takes a ver-
`that he refers to as properties. Consider a function that takes a ver-
`
`tex and returns all neighbours of that vertex; it is in fact taking a tex and returns all neighbours of that vertex; it is in fact taking a
`
`vertex and returning a subset of the vertices in the graph for which vertex and returning a subset of the vertices in the graph for which
`
`the neighbour relationship holds. Properties abstract the construc-the neighbour relationship holds. Properties abstract the construc-
`tion of subsets to arbitrary relationships, for example, the parent
`tion of subsets to arbitrary relationships, for example, the parent
`property can be used to return all parents of a vertex, or the child
`property can be used to return all parents of a vertex, or the child
`property to return all children.
`property to return all children.
`
`2.1 BinDiff Algorithm
`2.1 BinDiff Algorithm
`Initial matches are found by using all selectors across all the ver-
`Initial matches are found by using all selectors across all the ver-
`tices in the first graph cA. If a selector finds a uniquely matching
`tices in the first graph cA. If a selector finds a uniquely matching
`vertex in the second graph gB it returns a match (at, ba). The re-
`vertex in the second graph gB it returns a match (at, ba). The re-
`
`turned matches then create an initial mapping M, containing all turned matches then create an initial mapping M, containing all
`
`matching vertices. The initial match discovery algorithm is shown matching vertices. The initial match discovery algorithm is shown
`
`in Algorithm 1. in Algorithm 1.
`The next step of the BinDiff algorithm is to find more matches
`The next step of the BinDiff algorithm is to find more matches
`based on those found initially. The propagateMatches algorithm
`based on those found initially. The propagateMatches algorithm
`listed in Algorithm 2 does exactly this. For each initial match
`listed in Algorithm 2 does exactly this. For each initial match
`
`properties are used to create subsets S'A and SiB of the remaining properties are used to create subsets S'A and SiB of the remaining
`
`unmatched graph vertices, consisting of all the neighbours of the unmatched graph vertices, consisting of all the neighbours of the
`
`vertices in the match, or all the parents, etc., depending on which vertices in the match, or all the parents, etc., depending on which
`property is used. Selectors are then used on the vertices in the
`property is used. Selectors are then used on the vertices in the
`
`subsets to find new matches. Note that, as in initial match discovery, subsets to find new matches. Note that, as in initial match discovery,
`after a match is discovered the vertices must be removed from the
`after a match is discovered the vertices must be removed from the
`
`sets so that the algorithm only ever searches amongst unmatched sets so that the algorithm only ever searches amongst unmatched
`
`vertices. vertices.
`Finally the main program loop (Algorithm 3) brings all this
`Finally the main program loop (Algorithm 3) brings all this
`
`together by first creating the initial matches, and then by calling together by first creating the initial matches, and then by calling
`
`propagate until no new matches are discovered. Like many iterative propagate until no new matches are discovered. Like many iterative
`algorithms BinDiff can be reformulated in terms of so-called delta
`algorithms BinDiff can be reformulated in terms of so-called delta
`
`sets so that propogateMatches is not called on all of M, but merely sets so that propogateMatches is not called on all of M, but merely
`
`on those pairs that have been most recently added. on those pairs that have been most recently added.
`
`Illustrated Example
`2.2
`Illustrated Example
`2.2
`We will now consider an example execution of the BinDiff algo-
`We will now consider an example execution of the BinDiff algo-
`
`rithm by comparing a simple binary with itself, using only the stan-rithm by comparing a simple binary with itself, using only the stan-
`dard 3-tuple selector given previously. Listing 1 is a listing of the
`dard 3-tuple selector given previously. Listing 1 is a listing of the
`
`C source code of the program, while Figure 1 shows the recovered C source code of the program, while Figure 1 shows the recovered
`
`call graph (CG). Notice that only 8 functions can be seen in the call graph (CG). Notice that only 8 functions can be seen in the
`source code of the program, while the CG contains 18 nodes. This
`source code of the program, while the CG contains 18 nodes. This
`
`
`
`Listing 1. A basic C program Listing 1. A basic C program
`
`
`#include <stdio .h> #include <stdio .h>
`
`#include <stdlib .h> #include <stdlib .h>
`
`#include <time .h> #include <time .h>
`
`void DO {
`void DO {
`
`printf ("\n"); return; printf ("\n"); return;
`
`}
`}
`
`void CO {
`void CO {
`
`DO; DO;
`printf (" \n");
`printf (" \n");
`
`
`
`} }
`
`
`void B() { void B() {
`CO;
`CO;
`
`printf("\n"); printf("\n");
`
`}
`}
`
`void A() {
`void A() {
`
`BO; BO;
`printf("\n");
`printf("\n");
`
`}
`}
`
`int main() {
`int main() {
`srand(45);
`srand(45);
`AO;
`AO;
`
`BO; BO;
`
`C(); C();
`
`DO; DO;
`
`rand(); rand();
`
`is due to code inserted by the compiler for program initialisation,
`is due to code inserted by the compiler for program initialisation,
`
`and support functions such as srand. Since we are comparing the and support functions such as srand. Since we are comparing the
`
`binary with itself we would expect that all vertices will be matched. binary with itself we would expect that all vertices will be matched.
`
`15(1,0,0)
`15(1,0,0)
`
`
`
`7(791 7(791
`
`
`
`6(221 6(221
`
`
`
`3(110 3(110
`
`2(1,1,0) 1
`2(1,1,0) 1
`
`5(1,1,0
`5(1,1,0
`
`
`
`13(19 22 6 13(19 22 6
`
`12(6 6 2) 2)
`12(6 6 2) 2)
`
`
`2
`
`2
`
`,6,2) ,6,2)
`
`11(611(6
`
`6
`6
`
`2)2)
`
`
`
`10(610(6
`
`9(5 5,1
`9(5 5,1
`
`4(110
`4(110
`
`
`
`14(7,8,3 14(7,8,3
`
`
`
`0(553 0(553
`
`
`
`8

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site