throbber
The First Collision for Full SHA-1
`
`Marc Stevens1(B), Elie Bursztein2, Pierre Karpman1,
`Ange Albertini2, and Yarik Markov2
`
`1 CWI Amsterdam, Amsterdam, The Netherlands
`info@shattered.io
`2 Google Research, Mountain View, USA
`https://shattered.io
`
`Abstract. SHA-1 is a widely used 1995 NIST cryptographic hash func-
`tion standard that was officially deprecated by NIST in 2011 due to
`fundamental security weaknesses demonstrated in various analyses and
`theoretical attacks.
`Despite its deprecation, SHA-1 remains widely used in 2017 for docu-
`ment and TLS certificate signatures, and also in many software such as
`the GIT versioning system for integrity and backup purposes.
`A key reason behind the reluctance of many industry players to replace
`SHA-1 with a safer alternative is the fact that finding an actual collision
`has seemed to be impractical for the past eleven years due to the high
`complexity and computational cost of the attack.
`In this paper, we demonstrate that SHA-1 collision attacks have finally
`become practical by providing the first known instance of a collision.
`Furthermore, the prefix of the colliding messages was carefully chosen
`so that they allow an attacker to forge two distinct PDF documents
`with the same SHA-1 hash that display different arbitrarily-chosen visual
`contents.
`We were able to find this collision by combining many special cryptan-
`alytic techniques in complex ways and improving upon previous work. In
`total the computational effort spent is equivalent to 263.1 calls to SHA-1’s
`compression function, and took approximately 6 500 CPU years and 100
`GPU years. While the computational power spent on this collision is
`larger than other public cryptanalytic computations, it is still more than
`100 000 times faster than a brute force search.
`
`Keywords: Hash function · Cryptanalysis · Collision attack · Collision
`example · Differential path construction
`
`1 Introduction
`A cryptographic hash function H : {0, 1}∗ → {0, 1}n is a function that computes
`for any arbitrarily long message M a fixed-length hash value of n bits. It is
`a versatile cryptographic primitive used in many applications including digital
`signature schemes, message authentication codes, password hashing and content-
`addressable storage. The security or even the proper functioning of many of these
`c(cid:3) International Association for Cryptologic Research 2017
`J. Katz and H. Shacham (Eds.): CRYPTO 2017, Part I, LNCS 10401, pp. 570–596, 2017.
`DOI: 10.1007/978-3-319-63688-7 19
`
`Page 1 of 27
`
`CSCO-1028
`Page 1 of 27
`
`

`

`The First Collision for Full SHA-1
`
`571
`
`applications rely on the assumption that it is practically impossible to find col-
`lisions, i.e. two distinct messages x, y that hash to the same value H(x) = H(y).
`When the hash function behaves in a “sufficiently random” way, the expected
`number of calls to H (or in practice its underlying fixed-size function) to find a
`(cid:2)
`π/2 · 2n/2 (see e.g. [33, App-
`collision using an optimal generic algorithm is
`ndix A]); an algorithm that is faster at finding collisions for H is then a collision
`attack for this function.
`A major family of hash function is “MD-SHA”, which includes MD5,
`SHA-1 and SHA-2 that all have found widespread use. This family originally
`started with MD4 [36] in 1990, which was quickly replaced by MD5 [37] in 1992
`due to serious attacks [9,11]. Despite early known weaknesses of its underlying
`compression function [10], MD5 was widely deployed by the software industry
`for over a decade. The MD5CRK project that attempted to find a collision for
`MD5 by brute force was halted early in 2004, when Wang and Yu produced
`explicit collisions [49], found by a groundbreaking attack that pioneered new
`techniques. In a major development, Stevens et al. [45] later showed that a more
`powerful type of attack (the so-called chosen-prefix collision attack) could be
`performed against MD5. This eventually led to the forgery of a Rogue Certifi-
`cation Authority that in principle completely undermined HTTPS security [46]
`in 2008. Despite this, even in 2017 there are still issues in deprecating MD5 for
`signatures [18].
`Currently, the industry is facing a similar challenge in the deprecation of
`SHA-1, a 1995 NIST standard [31]. It is one of the main hash functions of today,
`and it also has been facing important attacks since 2005. Based on previous suc-
`cessful cryptanalysis [3–5] of SHA-0 [30] (SHA-1’s predecessor, that only differs
`by a single rotation in the message expansion function), Wang et al. [48] pre-
`sented in 2005 the very first collision attack on SHA-1 that is faster than brute-
`force. This attack, while groundbreaking, was purely theoretical as its expected
`cost of 269 calls to SHA-1’s compression function was practically out-of-reach.
`Therefore, as a proof of concept, many teams worked on generating collisions
`for reduced versions of the function: 64 steps [8] (with a cost of 235 SHA-1 calls),
`70 steps [7] (cost 244 SHA-1), 73 steps [15] (cost 250.7 SHA-1) and finally 75
`steps [16] (cost 257.7 SHA-1) using extensive GPU computation power.
`In 2013, building on these advances and a novel rigorous framework for ana-
`lyzing SHA-1, the current best collision attack on full SHA-1 was presented by
`Stevens [43] with an estimated cost of 261 calls to the SHA-1 compression func-
`tion. Nevertheless, a publicly known collision still remained out of reach. This
`was also highlighted by Schneier [38] in 2012, when he estimated the cost of a
`SHA-1 collision attack to be around US$ 700K in 2015, down to about US$ 173K
`in 2018 (using calculations by Walker based on a 261 attack cost [43], Amazon
`EC2 spot prices and Moore’s Law), which he deemed to be within the resources
`of criminals.
`More recently, a collision for the full compression function underlying
`SHA-1 was obtained by Stevens et al. [44] using a start-from-the-middle app-
`roach and a highly efficient GPU framework (first used to mount a similar
`
`Page 2 of 27
`
`

`

`572
`
`M. Stevens et al.
`
`freestart attack on the function reduced to 76 steps [21]). This required only
`a reasonable amount of GPU computation power, about 10 days using 64
`GPUs, equivalent to approximately 257.5 calls to SHA-1 on GPU. Based on
`this attack, the authors projected that a collision attack on SHA-1 may cost
`between US$ 75K and US$ 120K by renting GPU computing time on Amazon
`EC2 [39] using spot-instances, which is significantly lower than Schneier’s 2012
`estimates. These new projections had almost immediate effect when CABForum
`Ballot 152 to extend issuance of SHA-1 based HTTPS certificates was with-
`drawn [13], and SHA-1 was deprecated for digital signatures in the IETF’s TLS
`protocol specification version 1.3.
`Unfortunately CABForum restrictions on the use of SHA-1 only apply to
`actively enrolled Certification Authority certificates and not on any other cer-
`tificates, e.g. retracted CA certificates that are still supported by older systems
`(and CA certificates have indeed been retracted for continued use of SHA-1 cer-
`tificates to serve to these older systems unchecked by CABForum regulations1),
`and certificates for other TLS applications including up to 10% of credit card
`payment systems [29,47]. It thus remains in widespread use across the software
`industry for, e.g., digital signatures of software, documents, and many other
`applications, most notably in the GIT versioning system.
`It is well worth noting that academic researchers have not been the only
`ones to compute (and exploit) hash function collisions. Nation-state actors [24,
`25,34] have been linked to the highly advanced espionage malware “Flame” that
`was found targeting the Middle-East in May 2012. As it turned out, it used a
`forged signature to infect Windows machines via a man-in-the-middle attack on
`Windows Update. Using a new technique of counter-cryptanalysis that is able to
`expose cryptanalytic collision attacks given only one message from a colliding
`message pair, it was proven that the forged signature was made possible by a
`then secret chosen-prefix attack on MD5 [12,42].
`
`2 Our Contributions
`
`We are the first to exhibit an example collision for SHA-1, presented in Table 1,
`thereby proving that theoretical attacks on SHA-1 have now become practical.
`Our work builds upon the best known theoretical collision attack [43] with esti-
`mated cost of 261 SHA-1 calls. This is an identical-prefix collision attack, where
`a given prefix P is extended with two distinct near-collision block pairs such
`that they collide for any suffix S:
`(cid:3)
`||M (1)
`P||M (1)
`
`SHA-1
`
`1
`
`2
`
`(cid:4)
`
`||S
`
`= SHA-1
`
`(cid:3)
`
`P||M (2)
`
`1
`
`||M (2)
`
`2
`
`||S
`
`(cid:4)
`
`.
`
`(1)
`
`The computational effort spent on our attack is estimated to be equivalent to
`263.1 SHA-1 calls (see Sect. 6). There is certainly a gap between the theoretical
`attack as presented in [43] and our executed practical attack that was based
`
`1 For instance, SHA-1 certificates are still being sold by CloudFlare at the time of
`writing: https://www.cloudflare.com/ssl/dedicated-certificates/.
`
`Page 3 of 27
`
`

`

`The First Collision for Full SHA-1
`
`573
`
`Table 1. Colliding message blocks for SHA-1.
`
`4e a9 62 69 7c 87 6e 26 74 d1 07 f0 fe c6 79 84 14 f5 bf 45
`
`7f 46 dc 93 a6 b6 7e 01 3b 02 9a aa 1d b2 56 0b
`45 ca 67 d6 88 c7 f8 4b 8c 4c 79 1f e0 2b 3d f6
`14 f8 6d b1 69 09 01 c5 6b 45 c1 53 0a fe df b7
`60 38 e9 72 72 2f e7 ad 72 8f 0e 49 04 e0 46 c2
`
`8d 64 d6 17 ff ed 53 52 eb c8 59 15 5e c7 eb 34 f3 8a 5a 7b
`
`30 57 0f e9 d4 13 98 ab e1 2e f5 bc 94 2b e3 35
`42 a4 80 2d 98 b5 d7 0f 2a 33 2e c3 7f ac 35 14
`e7 4d dc 0f 2c c1 a8 74 cd 0c 78 30 5a 21 56 64
`61 30 97 89 60 6b d0 bf 3f 98 cd a8 04 46 29 a1
`1e ac b2 5e d5 97 0d 10 f1 73 69 63 57 71 bc 3a 17 b4 8a c5
`
`4e a9 62 69 7c 87 6e 26 74 d1 07 f0 fe c6 79 84 14 f5 bf 45
`
`73 46 dc 91 66 b6 7e 11 8f 02 9a b6 21 b2 56 0f
`f9 ca 67 cc a8 c7 f8 5b a8 4c 79 03 0c 2b 3d e2
`18 f8 6d b3 a9 09 01 d5 df 45 c1 4f 26 fe df b3
`dc 38 e9 6a c2 2f e7 bd 72 8f 0e 45 bc e0 46 d2
`
`8d 64 c8 21 ff ed 52 e2 eb c8 59 15 5e c7 eb 36 73 8a 5a 7b
`
`3c 57 0f eb 14 13 98 bb 55 2e f5 a0 a8 2b e3 31
`fe a4 80 37 b8 b5 d7 1f 0e 33 2e df 93 ac 35 00
`eb 4d dc 0d ec c1 a8 64 79 0c 78 2c 76 21 56 60
`dd 30 97 91 d0 6b d0 af 3f 98 cd a4 bc 46 29 b1
`1e ac b2 5e d5 97 0d 10 f1 73 69 63 57 71 bc 3a 17 b4 8a c5
`
`CV0
`M 1
`1
`
`CV 1
`1
`M 1
`2
`
`CV2
`
`CV0
`M 2
`1
`
`CV 2
`1
`M 2
`2
`
`CV2
`
`on it. Indeed, the theoretical attack’s estimated complexity does not include
`the inherent relative loss in efficiency when using GPUs, nor the inefficiency
`we encountered in actually launching a large scale computation distributed over
`several data centers. Moreover, the construction of the second part of the attack
`was significantly more complicated than could be expected from the literature.
`, M (2)
`To find the first near-collision block pair (M (1)
`1 ) we employed the open-
`1
`source code from [43], which was modified to work with our prefix P given
`in Table 2, and for large scale distribution over several data centers. To find
`, M (2)
`the second near-collision block pair (M (1)
`2 ) that leads to the collision was
`2
`more challenging, as the attack cost is known to be significantly higher, but also
`because of additional obstacles.
`In Sect. 5 we will discuss in particular the process of building the second near-
`collision attack. Essentially we followed the same steps as was done for the first
`near-collision attack [43], combining many existing cryptanalytic techniques. Yet
`we further employed the SHA-1 collision search GPU framework from Karpman
`et al. [21] to achieve a significantly more cost efficient attack.
`We also describe two new additional techniques used in the construction of
`the second near-collision attack. The first allowed us to use additional differential
`
`Page 4 of 27
`
`

`

`574
`
`M. Stevens et al.
`
`Table 2. Identical prefix of our collision.
`
`25 50 44 46 2d 31 2e 33 0a 25 e2 e3 cf d3 0a 0a
`0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 57 69 64 74
`68 20 32 20 30 20 52 2f 48 65 69 67 68 74 20 33
`20 30 20 52 2f 54 79 70 65 20 34 20 30 20 52 2f
`53 75 62 74 79 70 65 20 35 20 30 20 52 2f 46 69
`6c 74 65 72 20 36 20 30 20 52 2f 43 6f 6c 6f 72
`53 70 61 63 65 20 37 20 30 20 52 2f 4c 65 6e 67
`74 68 20 38 20 30 20 52 2f 42 69 74 73 50 65 72
`43 6f 6d 70 6f 6e 65 6e 74 20 38 3e 3e 0a 73 74
`72 65 61 6d 0a ff d8 ff fe 00 24 53 48 41 2d 31
`20 69 73 20 64 65 61 64 21 21 21 21 21 85 2f ec
`09 23 39 75 9c 39 b1 a1 c6 3c 4c 97 e1 ff fe 01
`
`%PDF-1.3.%......
`.1 0 obj.<</Widt
`h 2 0 R/Height 3
`0 R/Type 4 0 R/
`Subtype 5 0 R/Fi
`lter 6 0 R/Color
`Space 7 0 R/Leng
`th 8 0 R/BitsPer
`Component 8>>.st
`ream......$SHA-1
`is dead!!!!!./.
`.#9u.9...<L.....
`
`paths around step 23 for increased success probability and more degrees of free-
`dom without compromising the use of an early-stop technique. The second was
`necessary to overcome a serious problem of an unsolvable strongly over-defined
`system of equations over the first few steps of SHA-1’s compression function that
`threatened the feasibility of finishing this project.
`As can be deduced from Eq. 1, our example colliding files only differ in two
`successive random-looking message blocks generated by our attack. We exploit
`these limited differences to craft two colliding PDF documents containing arbi-
`trary distinct images. Examples can be downloaded from https://shattered.io.
`PDFs with the same MD5 hash have previously been constructed by Gebhardt
`et al. [14] by exploiting so-called Indexed Color Tables and Color Transformation
`functions. However, this method is not effective for many common PDF viewers
`that lack support for these functionalities. Our PDFs rely on distinct parsings
`of JPEG images, similar to Gebhardt et al.’s TIFF technique [14] and Albertini
`et al.’s JPEG technique [1]. Yet we improved upon these basic techniques using
`very low-level “wizard” JPEG features such that these work in all common PDF
`viewers, and even allow very large JPEGs that can be used to craft multi-page
`PDFs. This overall approach and the technical details will be described in a
`separate article [2].
`The remainder of this paper is organized as follows. We first give a brief
`description of SHA-1 in Sect. 3. Then, we give a high-level overview of our attack
`in Sect. 4, followed by Sect. 5 that details the entire process and the cryptana-
`lytic techniques employed, where we also highlight improvements with respect
`to previous work. Finally, we discuss the large-scale distributed computations
`required to find the two near-collision block pairs in Sect. 6. The parameters
`used to find the second colliding block are given in the appendix, in Sect. A.
`
`3 The SHA-1 Hash Function
`
`We provide a brief description of SHA-1 as defined by NIST [31]. SHA-1 takes an
`arbitrary-length message and computes a 160-bit hash. It divides the (padded)
`
`Page 5 of 27
`
`

`

`The First Collision for Full SHA-1
`
`575
`
`input message into k blocks M1, . . . , Mk of 512 bits. The 160-bit internal state
`CVj of SHA-1, called the chaining value, is initialized to a predefined initial
`value CV0 = IV . Each message block is then fed to a compression function h
`that updates the chaining value, i.e. CVj+1 = h(CVj, Mj+1), for 0 ≤ j < k,
`where the final CVk is output as the hash.
`The compression function h takes a 160-bit chaining value CVj and a 512-
`bit message block Mj+1 as inputs, and outputs a new 160-bit chaining value
`CVj+1. It mixes the message block into the chaining value as follows, operating
`on words, simultaneously seen as 32-bit strings and as elements of Z/232Z: the
`input chaining value is parsed as five words a, b, c, d, e, and the message block as
`16 words m0, . . . , m15. The latter are expanded into 80 words using the following
`recursive linear equation:
`for 16 ≤ i <80.
`mi = (mi−3 ⊕ mi−8 ⊕ mi−14 ⊕ mi−16)(cid:2)1,
`Starting from (A−4, A−3, A−2, A−1, A0) := (e(cid:2)2, d(cid:2)2, c(cid:2)2, b, a), each mi is mixed
`into an intermediate state over 80 steps i = 0, . . . , 79:
`, A(cid:3)2i−3) + A(cid:3)2i−4 + Ki + mi,
`
`
`Ai+1 = A(cid:2)5i + ϕi(Ai−1, A(cid:3)2
`
`i−2
`where ϕi and Ki are predefined Boolean functions and constants:
`
`ϕi(x, y, z)
`Step i
`ϕIF = (x ∧ y) ∨ (¬x ∧ z)
`0 ≤ i <20
`20 ≤ i <40 ϕXOR = x ⊕ y ⊕ z
`40 ≤ i <60 ϕMAJ = (x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z) 0x8f1bbcdc
`60 ≤ i <80 ϕXOR = x ⊕ y ⊕ z
`
`Ki
`
`0x5a827999
`
`0x6ed9eba1
`
`0xca62c1d6
`
`After the 80 steps, the new chaining value is computed as the sum of the
`input chaining value and the final intermediate state:
`, d + A(cid:3)2
`CVj+1 = (a + A80, b + A79, c + A(cid:3)2
`
`, e + A(cid:3)2
`76 ).
`
`77
`
`78
`
`4 Overview of our SHA-1 Collision Attack
`
`We illustrate our attack from a high level in Fig. 1. Starting from identical chain-
`ing values for two messages, we use two pairs of blocks. The differences in the
`first block pair cause a small difference in the output chaining value, which is
`canceled by the difference in the second block pair, leading again to identical
`chaining values and hence a collision (indicated by (2)). We employ differential
`paths that are a precise description of differences in state words and message
`words and of how these differences should propagate through the 80 steps.
`Note that although the first five state words are fixed by the chaining value,
`one can freely modify message words and thus directly influence the next sixteen
`
`Page 6 of 27
`
`

`

`576
`
`M. Stevens et al.
`
`Fig. 1. Attack overview
`
`state words. Moreover, with additional effort this can be extended to obtain lim-
`ited influence over another eight state words. However, control over the remaining
`state words (indicated by (1)) is very hard and thus requires very sparse target
`differences that correctly propagate with probability as high as possible. Fur-
`thermore, these need to be compatible with differences in the expanded message
`words. The key solution is the concept of local collisions [5], where any state bit-
`difference introduced by a perturbation message bit-difference is to be canceled
`in the next five steps using correction message bit-differences.
`To ensure all message word bit differences are compatible with the linear
`message expansion, one uses a disturbance vector (DV) [5] that is a correctly
`expanded message itself, but where every “1” bit marks the start of a local
`collision. The selection of a good disturbance vector has a very high impact on
`the overall attack cost. As previously shown by Wang et al. [48], the main reason
`of using two block pairs (i.e. to search for a near-collision over a first message
`block, that is completed to a full collision over a second) instead of only one
`is that this choice alleviates an important restriction on the disturbance vector,
`namely that there are no state differences after the last step. Similarly, it may be
`impossible to unite the input chaining value difference with the local collisions for
`an arbitrary disturbance vector. This was solved by Wang et al. [48] by crafting a
`tailored differential path (called the non-linear (NL) path, indicated by (3)) that
`over the first 16 steps connects the input chaining value differences to the local
`collision differences over the remaining steps (called the linear path, referring to
`the linear message expansion dictating the local collision positions).
`One has to choose a good disturbance vector, then craft a non-linear differ-
`ential path for each of the two near-collision attacks (over the first and second
`message blocks), determine a system of equations over all steps and finally find
`a solution in the form of a message block pair (as indicated by (4A) and (4B)).
`Note that one can only craft the non-linear path for the second near-collision
`attack once the chaining values resulting from the first block pair are known.
`This entire process including our improvements is described below.
`
`5 Near-Collision Attack Procedure
`
`This section describes the overall procedure of each of the two near-collision
`attacks. Since we relied on our modification of Stevens’ public source-code [17,43]
`
`Page 7 of 27
`
`

`

`The First Collision for Full SHA-1
`
`577
`
`DV selection
`
`Craft non-
`linear path
`
`Run attack
`
`Write attack
`algorithm
`
`Determine
`attack
`conditions
`
`Find
`speed-ups
`(boomerangs)
`
`Find
`additional
`conditions
`
`Fix
`solvability
`first steps
`
`Fig. 2. The main steps for each near-collision attack.
`
`for the first near-collision attack, we focus on our extended procedure for our
`second near-collision attack. As shown in Fig. 2, this involves the following steps
`that are further detailed below:
`
`1. selection of the disturbance vector (same for both attacks);
`2. construction of the non-linear differential path;
`3. determine attack conditions over all steps;
`4. find additional conditions beyond the fixed differential path for early-stop;
`5. if necessary fix solvability of attack conditions over the first few steps;
`6. find message modification rules to speed-up collision search;
`7. write the attack algorithm;
`8. finally, run the attack to find a near-collision block pair.
`
`5.1 Disturbance Vector Selection
`
`The selection of which disturbance vector to use is a major choice, as it directly
`determines many aspects of the collision attack. These include the message XOR
`differences, but also in theory the optimal attack choices over the linear path,
`including the optimal set of candidate endings for the non-linear path together
`with optimal linear message-bit equations that maximize the success probability
`over the linear part.
`Historically several approaches have been used to analyze a disturbance vec-
`tor to estimate attack costs over the linear part. Initially, the Hamming weight of
`the DV that counts the active number of local collisions was used (see e.g. [4,35]).
`For the first theoretical attack on SHA-1 with cost 269 SHA-1-calls by Wang
`et al. [48] a more refined measure was used, that counts the number of bit-
`conditions on the state and message bits that ensure that the differential path
`would be followed. This was later refined by Yajima et al. [51] to a more pre-
`cise count by exploiting all possible so-called bit compressions and interactions
`through the Boolean functions. However, this approach does not allow any dif-
`ference in the carry propagation, which otherwise could result in alternate differ-
`ential paths that may improve the overall success probability. Therefore, Mendel
`et al. [28] proposed to use the more accurate probability of single local collisions
`where carry propagations are allowed, in combination with known local collision
`interaction corrections.
`
`Page 8 of 27
`
`

`

`578
`
`M. Stevens et al.
`
`The current state-of-the-art is joint-local-collision analysis (JLCA) intro-
`duced by Stevens [41,43] which given sets of allowed differences for each state
`word Ai and message word mi (given by the disturbance vector) computes the
`exact optimal success probability over the specified steps by exhaustively evalu-
`ating all differential paths with those allowed differences. This approach is very
`powerful as it also provides important information for the next steps, namely
`the set of optimal chaining value differences (by considering arbitrary high prob-
`ability differences for the last five Ais) and the set of optimal endings for the
`non-linear path, together with a corresponding set of message-bit equations,
`using which the optimal highest success probability of the specified steps can
`actually be achieved. The best theoretical collision attack on SHA-1 with cost
`261 SHA-1 calls [43] was built using this analysis. As we build upon this collision
`attack, we use the same disturbance vector, named II(52, 0) by Manuel [26] and
`originally described by Jutla and Patthak [20].
`
`5.2 Construction of a Non-linear Differential Path
`
`Once the disturbance vector and the corresponding linear part of the differential
`path have been fixed, the next step consists in finding a suitable non-linear path
`connecting the chaining value pair (with fixed differences) to the linear part.
`This step needs to be done separately for each near-collision attack of the full
`collision attack2.
`As explained for instance in [43], in the case of the first near-collision attack,
`the attacker has the advantage of two additional freedoms. Firstly, an arbitrary
`prefix can be included before the start of the attack to pre-fulfill a limited number
`of conditions on the chaining value. This allows greater freedom in constructing
`the non-linear path as this does not have to be restricted to a specific value
`of the chaining value pair, whereas the non-linear path for the second near-
`collision attack has to start from the specific given value of input chaining value
`pair. Secondly, it can use the entire set of output chaining value differences with
`the same highest probability. The first near-collision attack is not limited to a
`particular value and succeeds when it finds a chaining value difference in this set,
`whereas the second near-collision attack has to cancel the specific difference in
`the resulting chaining value pair. Theory predicts the first near-collision attack
`to be at least a factor six faster than the second attack [43]. For our collision
`attack it is indeed the second near-collision attack that dominates the overall
`attack complexity.
`Historically, the first non-linear paths for SHA-1 were hand-crafted by Wang
`et al. Several algorithms were subsequently developed to automatically search
`for non-linear paths for MD5, SHA-1, and other functions of the MD-SHA family.
`The first automatic search for SHA-1 by De Canni`ere and Rechberger [8] was
`based on a guess-and-determine approach. This approach tracks the allowed
`
`2 We eventually produced two message block pair solutions for the first near-collision
`attack. This provided a small additional amount of freedom in the search for the
`non-linear path of the second block.
`
`Page 9 of 27
`
`

`

`The First Collision for Full SHA-1
`
`579
`
`values of each bit pair in the two related compression function computations.
`It starts with no constraints on the values of these bit pairs other than the
`chaining value pair and the linear part differences. It then repeatedly restricts
`values on a selected bit pair and then propagates this information via the step
`function and linear message expansion relation, i.e., it determines and eliminates
`previously-allowed values for other bit pairs that are now impossible due the
`added restriction. Whenever a contradiction occurs, the algorithm backtracks
`and chooses a different restriction on the last selected bit pair.
`Another algorithm for SHA-1 was introduced by Yajima et al. [52] that is
`based on a meet-in-the-middle approach. It starts from two fully-specified differ-
`ential paths; the first is obtained from a forward expansion of the input chaining
`value pair, whereas the other is obtained from a backward expansion of the linear
`path. It then tries to connect these two differential paths over the remaining five
`steps in the middle by recursively iterating over all solutions over a particular
`step.
`A similar meet-in-the-middle algorithm was independently first developed for
`MD5 and then adapted to SHA-1 by Stevens et al. [17,41,45], which operates
`on bit-slices and is more efficient. The open-source HashClash project [17] seems
`to be the only publicly available non-linear path construction implementation,
`which we improved as follows. Originally, it expanded a large set of differential
`paths step by step, keeping only the best N paths after each step, for some user-
`specified number N. However, there might be several good differential paths
`that result in the same differences and conditions around the connecting five
`steps, where either none or all lead to fully-connected differential paths. Since
`we only need the best fully-connected differential path we can find, we only need
`to keep a best differential path from each subset of paths with the same differ-
`ences and conditions over the last five steps that were extended. So to remove
`this redundancy, for each step we extend and keep, say, the 4N best paths, then
`we remove all such superfluous paths, and finally keep at most N paths. This
`improvement led to a small but very welcome reduction in the amount of differ-
`ential path conditions under the same path construction parameter choices, but
`also allowed a better positioning of the largest density of sufficient conditions
`for the differential path.
`Construction of a very good non-linear path for the second near-collision
`attack using our improved HashClash version took a small effort with our
`improvements, yet even allowed us to restrict the section with high density of
`conditions to just the first six steps. However, to find a very good non-linear
`differential path that is also solvable turned out to be more complicated. Our
`final solution is described in Sect. 5.5, which in the end did allow us to build
`our attack on the best non-linear path we found without any compromises. The
`fixed version of this best non-linear path is presented in Fig. 3, Sect. A.
`
`5.3 Determine Attack Conditions
`
`Having selected the disturbance vector and constructed a non-linear path that
`bridges into the linear part, the next step is to determine the entire system of
`
`Page 10 of 27
`
`

`

`580
`
`M. Stevens et al.
`
`equations for the attack. This system of equations is expressed entirely over the
`computation of message M (1), and not over M (2), and consists of two types of
`equations:
`
`1. Linear equations over message bits. These are used to control the additive
`signs of the message word XOR differences implied by the disturbance vector.
`Since there are many different “signings” over the linear part with the same
`highest probability, instead of one specific choice one uses a linear hull that
`captures many choices to reduce the amount of necessary equations.
`2. Linear equations over state bits given by a fixed differential path up to some
`step i (that includes the non-linear path). These control whether there is a
`difference in a state bit and which sign it has, furthermore they force target
`differences in the outputs of the Boolean functions ϕi.
`
`We determine this entire system by employing our implementation of joint-
`local-collision analysis that has been improved as follows. JLCA takes input sets
`of allowed differences for each Ai and mi and exhaustively analyzes the set of
`differential paths with those allowed differences, which originally is only used to
`analyze the linear part. We additionally provide it with specific differences for
`Ai and mi as given by the non-linear path, so we can run JLCA over all 80 steps
`and have it output an optimal fixed differential path over steps 0, . . . , 22 together
`with an optimal set of linear equations over message bits over the remaining
`steps. These are optimal results since JLCA guarantees these lead to the highest
`probability that is possible using the given allowed differences, but furthermore
`that a largest linear hull is used to minimize the amount of equations.
`Note that having a fixed differential path over more steps directly provides
`more state bit equations which is helpful in the actual collision search because we
`can apply an early-stop technique. However, this also adds further restrictions on
`Ai limiting a set of allowed differences to a single specific difference. In our case
`limiting A24 would result, besides a drop in degrees of freedom, in a lower overall
`probability, thus we only use a fixed differential path up to step 22, i.e., up to
`A23. Below in Sect. 5.4 we show how we compensated for fewer state equations
`that the actual collision search uses to early stop.
`
`5.4 Find Additional State Conditions
`
`As explained in Sect. 5.3, the system of equations consists of linear equations
`over (expanded) message bits and linear equations over state bits. In the actual
`collision search algorithm, we depend on these state bit equations to stop com-
`putation on a bad current solution as early as possible and start backtracking.
`These state bit equations are directly given by a fixed differential path, where
`every bit difference in the state and message is fixed. Starting from step 23 we
`allow several alternate differential paths that increase success probability, but
`also allow distinct message word differences that lead to a decrease in the overall
`number of equations. Each alternate differential path depends on its own (dis-
`tinct) message word differences and leads to its own state bit equations. To find
`
`Page 11 of 27
`
`

`

`The First Collision for Full SHA-1
`
`581
`
`additional equations, we also consider linear equations over state and message
`bits around steps 21–25. Although in theory these could be computed by JLCA
`by exhaustively reconstructing all alternate differential paths and then deter-
`mining the desired linear equations, we instead took a much simpler approach.
`We generated a large amount of random solutions of the system of equations
`up to step 31 using an unoptimized general collision search algorithm. We then
`proceeded to exhaustively test potential linear equations over at most four state
`bits and message bits around steps 21–25, which is quite efficient as on average
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket