IPR2017-01933, No. 1028 Exhibit - 1028 Stevens, First Collision for Full SHA 1 (P.T.A.B. Aug. 11, 2017)

The First Collision for Full SHA-1
`
`Marc Stevens1(B), Elie Bursztein2, Pierre Karpman1,
`Ange Albertini2, and Yarik Markov2
`
`1 CWI Amsterdam, Amsterdam, The Netherlands
`info@shattered.io
`2 Google Research, Mountain View, USA
`https://shattered.io
`
`Abstract. SHA-1 is a widely used 1995 NIST cryptographic hash func-
`tion standard that was oﬃcially deprecated by NIST in 2011 due to
`fundamental security weaknesses demonstrated in various analyses and
`theoretical attacks.
`Despite its deprecation, SHA-1 remains widely used in 2017 for docu-
`ment and TLS certiﬁcate signatures, and also in many software such as
`the GIT versioning system for integrity and backup purposes.
`A key reason behind the reluctance of many industry players to replace
`SHA-1 with a safer alternative is the fact that ﬁnding an actual collision
`has seemed to be impractical for the past eleven years due to the high
`complexity and computational cost of the attack.
`In this paper, we demonstrate that SHA-1 collision attacks have ﬁnally
`become practical by providing the ﬁrst known instance of a collision.
`Furthermore, the preﬁx of the colliding messages was carefully chosen
`so that they allow an attacker to forge two distinct PDF documents
`with the same SHA-1 hash that display diﬀerent arbitrarily-chosen visual
`contents.
`We were able to ﬁnd this collision by combining many special cryptan-
`alytic techniques in complex ways and improving upon previous work. In
`total the computational eﬀort spent is equivalent to 263.1 calls to SHA-1’s
`compression function, and took approximately 6 500 CPU years and 100
`GPU years. While the computational power spent on this collision is
`larger than other public cryptanalytic computations, it is still more than
`100 000 times faster than a brute force search.
`
`Keywords: Hash function · Cryptanalysis · Collision attack · Collision
`example · Diﬀerential path construction
`
`1 Introduction
`A cryptographic hash function H : {0, 1}∗ → {0, 1}n is a function that computes
`for any arbitrarily long message M a ﬁxed-length hash value of n bits. It is
`a versatile cryptographic primitive used in many applications including digital
`signature schemes, message authentication codes, password hashing and content-
`addressable storage. The security or even the proper functioning of many of these
`c(cid:3) International Association for Cryptologic Research 2017
`J. Katz and H. Shacham (Eds.): CRYPTO 2017, Part I, LNCS 10401, pp. 570–596, 2017.
`DOI: 10.1007/978-3-319-63688-7 19
`
`Page 1 of 27
`
`CSCO-1028
`Page 1 of 27
`
`

`The First Collision for Full SHA-1
`
`571
`
`applications rely on the assumption that it is practically impossible to ﬁnd col-
`lisions, i.e. two distinct messages x, y that hash to the same value H(x) = H(y).
`When the hash function behaves in a “suﬃciently random” way, the expected
`number of calls to H (or in practice its underlying ﬁxed-size function) to ﬁnd a
`(cid:2)
`π/2 · 2n/2 (see e.g. [33, App-
`collision using an optimal generic algorithm is
`ndix A]); an algorithm that is faster at ﬁnding collisions for H is then a collision
`attack for this function.
`A major family of hash function is “MD-SHA”, which includes MD5,
`SHA-1 and SHA-2 that all have found widespread use. This family originally
`started with MD4 [36] in 1990, which was quickly replaced by MD5 [37] in 1992
`due to serious attacks [9,11]. Despite early known weaknesses of its underlying
`compression function [10], MD5 was widely deployed by the software industry
`for over a decade. The MD5CRK project that attempted to ﬁnd a collision for
`MD5 by brute force was halted early in 2004, when Wang and Yu produced
`explicit collisions [49], found by a groundbreaking attack that pioneered new
`techniques. In a major development, Stevens et al. [45] later showed that a more
`powerful type of attack (the so-called chosen-preﬁx collision attack) could be
`performed against MD5. This eventually led to the forgery of a Rogue Certiﬁ-
`cation Authority that in principle completely undermined HTTPS security [46]
`in 2008. Despite this, even in 2017 there are still issues in deprecating MD5 for
`signatures [18].
`Currently, the industry is facing a similar challenge in the deprecation of
`SHA-1, a 1995 NIST standard [31]. It is one of the main hash functions of today,
`and it also has been facing important attacks since 2005. Based on previous suc-
`cessful cryptanalysis [3–5] of SHA-0 [30] (SHA-1’s predecessor, that only diﬀers
`by a single rotation in the message expansion function), Wang et al. [48] pre-
`sented in 2005 the very ﬁrst collision attack on SHA-1 that is faster than brute-
`force. This attack, while groundbreaking, was purely theoretical as its expected
`cost of 269 calls to SHA-1’s compression function was practically out-of-reach.
`Therefore, as a proof of concept, many teams worked on generating collisions
`for reduced versions of the function: 64 steps [8] (with a cost of 235 SHA-1 calls),
`70 steps [7] (cost 244 SHA-1), 73 steps [15] (cost 250.7 SHA-1) and ﬁnally 75
`steps [16] (cost 257.7 SHA-1) using extensive GPU computation power.
`In 2013, building on these advances and a novel rigorous framework for ana-
`lyzing SHA-1, the current best collision attack on full SHA-1 was presented by
`Stevens [43] with an estimated cost of 261 calls to the SHA-1 compression func-
`tion. Nevertheless, a publicly known collision still remained out of reach. This
`was also highlighted by Schneier [38] in 2012, when he estimated the cost of a
`SHA-1 collision attack to be around US$ 700K in 2015, down to about US$ 173K
`in 2018 (using calculations by Walker based on a 261 attack cost [43], Amazon
`EC2 spot prices and Moore’s Law), which he deemed to be within the resources
`of criminals.
`More recently, a collision for the full compression function underlying
`SHA-1 was obtained by Stevens et al. [44] using a start-from-the-middle app-
`roach and a highly eﬃcient GPU framework (ﬁrst used to mount a similar
`
`Page 2 of 27
`
`

`572
`
`M. Stevens et al.
`
`freestart attack on the function reduced to 76 steps [21]). This required only
`a reasonable amount of GPU computation power, about 10 days using 64
`GPUs, equivalent to approximately 257.5 calls to SHA-1 on GPU. Based on
`this attack, the authors projected that a collision attack on SHA-1 may cost
`between US$ 75K and US$ 120K by renting GPU computing time on Amazon
`EC2 [39] using spot-instances, which is signiﬁcantly lower than Schneier’s 2012
`estimates. These new projections had almost immediate eﬀect when CABForum
`Ballot 152 to extend issuance of SHA-1 based HTTPS certiﬁcates was with-
`drawn [13], and SHA-1 was deprecated for digital signatures in the IETF’s TLS
`protocol speciﬁcation version 1.3.
`Unfortunately CABForum restrictions on the use of SHA-1 only apply to
`actively enrolled Certiﬁcation Authority certiﬁcates and not on any other cer-
`tiﬁcates, e.g. retracted CA certiﬁcates that are still supported by older systems
`(and CA certiﬁcates have indeed been retracted for continued use of SHA-1 cer-
`tiﬁcates to serve to these older systems unchecked by CABForum regulations1),
`and certiﬁcates for other TLS applications including up to 10% of credit card
`payment systems [29,47]. It thus remains in widespread use across the software
`industry for, e.g., digital signatures of software, documents, and many other
`applications, most notably in the GIT versioning system.
`It is well worth noting that academic researchers have not been the only
`ones to compute (and exploit) hash function collisions. Nation-state actors [24,
`25,34] have been linked to the highly advanced espionage malware “Flame” that
`was found targeting the Middle-East in May 2012. As it turned out, it used a
`forged signature to infect Windows machines via a man-in-the-middle attack on
`Windows Update. Using a new technique of counter-cryptanalysis that is able to
`expose cryptanalytic collision attacks given only one message from a colliding
`message pair, it was proven that the forged signature was made possible by a
`then secret chosen-preﬁx attack on MD5 [12,42].
`
`2 Our Contributions
`
`We are the ﬁrst to exhibit an example collision for SHA-1, presented in Table 1,
`thereby proving that theoretical attacks on SHA-1 have now become practical.
`Our work builds upon the best known theoretical collision attack [43] with esti-
`mated cost of 261 SHA-1 calls. This is an identical-preﬁx collision attack, where
`a given preﬁx P is extended with two distinct near-collision block pairs such
`that they collide for any suﬃx S:
`(cid:3)
`||M (1)
`P||M (1)
`
`SHA-1
`
`1
`
`2
`
`(cid:4)
`
`||S
`
`= SHA-1
`
`(cid:3)
`
`P||M (2)
`
`1
`
`||M (2)
`
`2
`
`||S
`
`(cid:4)
`
`.
`
`(1)
`
`The computational eﬀort spent on our attack is estimated to be equivalent to
`263.1 SHA-1 calls (see Sect. 6). There is certainly a gap between the theoretical
`attack as presented in [43] and our executed practical attack that was based
`
`1 For instance, SHA-1 certiﬁcates are still being sold by CloudFlare at the time of
`writing: https://www.cloudﬂare.com/ssl/dedicated-certiﬁcates/.
`
`Page 3 of 27
`
`

`The First Collision for Full SHA-1
`
`573
`
`Table 1. Colliding message blocks for SHA-1.
`
`4e a9 62 69 7c 87 6e 26 74 d1 07 f0 fe c6 79 84 14 f5 bf 45
`
`7f 46 dc 93 a6 b6 7e 01 3b 02 9a aa 1d b2 56 0b
`45 ca 67 d6 88 c7 f8 4b 8c 4c 79 1f e0 2b 3d f6
`14 f8 6d b1 69 09 01 c5 6b 45 c1 53 0a fe df b7
`60 38 e9 72 72 2f e7 ad 72 8f 0e 49 04 e0 46 c2
`
`8d 64 d6 17 ff ed 53 52 eb c8 59 15 5e c7 eb 34 f3 8a 5a 7b
`
`30 57 0f e9 d4 13 98 ab e1 2e f5 bc 94 2b e3 35
`42 a4 80 2d 98 b5 d7 0f 2a 33 2e c3 7f ac 35 14
`e7 4d dc 0f 2c c1 a8 74 cd 0c 78 30 5a 21 56 64
`61 30 97 89 60 6b d0 bf 3f 98 cd a8 04 46 29 a1
`1e ac b2 5e d5 97 0d 10 f1 73 69 63 57 71 bc 3a 17 b4 8a c5
`
`4e a9 62 69 7c 87 6e 26 74 d1 07 f0 fe c6 79 84 14 f5 bf 45
`
`73 46 dc 91 66 b6 7e 11 8f 02 9a b6 21 b2 56 0f
`f9 ca 67 cc a8 c7 f8 5b a8 4c 79 03 0c 2b 3d e2
`18 f8 6d b3 a9 09 01 d5 df 45 c1 4f 26 fe df b3
`dc 38 e9 6a c2 2f e7 bd 72 8f 0e 45 bc e0 46 d2
`
`8d 64 c8 21 ff ed 52 e2 eb c8 59 15 5e c7 eb 36 73 8a 5a 7b
`
`3c 57 0f eb 14 13 98 bb 55 2e f5 a0 a8 2b e3 31
`fe a4 80 37 b8 b5 d7 1f 0e 33 2e df 93 ac 35 00
`eb 4d dc 0d ec c1 a8 64 79 0c 78 2c 76 21 56 60
`dd 30 97 91 d0 6b d0 af 3f 98 cd a4 bc 46 29 b1
`1e ac b2 5e d5 97 0d 10 f1 73 69 63 57 71 bc 3a 17 b4 8a c5
`
`CV0
`M 1
`1
`
`CV 1
`1
`M 1
`2
`
`CV2
`
`CV0
`M 2
`1
`
`CV 2
`1
`M 2
`2
`
`CV2
`
`on it. Indeed, the theoretical attack’s estimated complexity does not include
`the inherent relative loss in eﬃciency when using GPUs, nor the ineﬃciency
`we encountered in actually launching a large scale computation distributed over
`several data centers. Moreover, the construction of the second part of the attack
`was signiﬁcantly more complicated than could be expected from the literature.
`, M (2)
`To ﬁnd the ﬁrst near-collision block pair (M (1)
`1 ) we employed the open-
`1
`source code from [43], which was modiﬁed to work with our preﬁx P given
`in Table 2, and for large scale distribution over several data centers. To ﬁnd
`, M (2)
`the second near-collision block pair (M (1)
`2 ) that leads to the collision was
`2
`more challenging, as the attack cost is known to be signiﬁcantly higher, but also
`because of additional obstacles.
`In Sect. 5 we will discuss in particular the process of building the second near-
`collision attack. Essentially we followed the same steps as was done for the ﬁrst
`near-collision attack [43], combining many existing cryptanalytic techniques. Yet
`we further employed the SHA-1 collision search GPU framework from Karpman
`et al. [21] to achieve a signiﬁcantly more cost eﬃcient attack.
`We also describe two new additional techniques used in the construction of
`the second near-collision attack. The ﬁrst allowed us to use additional diﬀerential
`
`Page 4 of 27
`
`

`574
`
`M. Stevens et al.
`
`Table 2. Identical preﬁx of our collision.
`
`25 50 44 46 2d 31 2e 33 0a 25 e2 e3 cf d3 0a 0a
`0a 31 20 30 20 6f 62 6a 0a 3c 3c 2f 57 69 64 74
`68 20 32 20 30 20 52 2f 48 65 69 67 68 74 20 33
`20 30 20 52 2f 54 79 70 65 20 34 20 30 20 52 2f
`53 75 62 74 79 70 65 20 35 20 30 20 52 2f 46 69
`6c 74 65 72 20 36 20 30 20 52 2f 43 6f 6c 6f 72
`53 70 61 63 65 20 37 20 30 20 52 2f 4c 65 6e 67
`74 68 20 38 20 30 20 52 2f 42 69 74 73 50 65 72
`43 6f 6d 70 6f 6e 65 6e 74 20 38 3e 3e 0a 73 74
`72 65 61 6d 0a ff d8 ff fe 00 24 53 48 41 2d 31
`20 69 73 20 64 65 61 64 21 21 21 21 21 85 2f ec
`09 23 39 75 9c 39 b1 a1 c6 3c 4c 97 e1 ff fe 01
`
`%PDF-1.3.%......
`.1 0 obj.<</Widt
`h 2 0 R/Height 3
`0 R/Type 4 0 R/
`Subtype 5 0 R/Fi
`lter 6 0 R/Color
`Space 7 0 R/Leng
`th 8 0 R/BitsPer
`Component 8>>.st
`ream......$SHA-1
`is dead!!!!!./.
`.#9u.9...<L.....
`
`paths around step 23 for increased success probability and more degrees of free-
`dom without compromising the use of an early-stop technique. The second was
`necessary to overcome a serious problem of an unsolvable strongly over-deﬁned
`system of equations over the ﬁrst few steps of SHA-1’s compression function that
`threatened the feasibility of ﬁnishing this project.
`As can be deduced from Eq. 1, our example colliding ﬁles only diﬀer in two
`successive random-looking message blocks generated by our attack. We exploit
`these limited diﬀerences to craft two colliding PDF documents containing arbi-
`trary distinct images. Examples can be downloaded from https://shattered.io.
`PDFs with the same MD5 hash have previously been constructed by Gebhardt
`et al. [14] by exploiting so-called Indexed Color Tables and Color Transformation
`functions. However, this method is not eﬀective for many common PDF viewers
`that lack support for these functionalities. Our PDFs rely on distinct parsings
`of JPEG images, similar to Gebhardt et al.’s TIFF technique [14] and Albertini
`et al.’s JPEG technique [1]. Yet we improved upon these basic techniques using
`very low-level “wizard” JPEG features such that these work in all common PDF
`viewers, and even allow very large JPEGs that can be used to craft multi-page
`PDFs. This overall approach and the technical details will be described in a
`separate article [2].
`The remainder of this paper is organized as follows. We ﬁrst give a brief
`description of SHA-1 in Sect. 3. Then, we give a high-level overview of our attack
`in Sect. 4, followed by Sect. 5 that details the entire process and the cryptana-
`lytic techniques employed, where we also highlight improvements with respect
`to previous work. Finally, we discuss the large-scale distributed computations
`required to ﬁnd the two near-collision block pairs in Sect. 6. The parameters
`used to ﬁnd the second colliding block are given in the appendix, in Sect. A.
`
`3 The SHA-1 Hash Function
`
`We provide a brief description of SHA-1 as deﬁned by NIST [31]. SHA-1 takes an
`arbitrary-length message and computes a 160-bit hash. It divides the (padded)
`
`Page 5 of 27
`
`

`The First Collision for Full SHA-1
`
`575
`
`input message into k blocks M1, . . . , Mk of 512 bits. The 160-bit internal state
`CVj of SHA-1, called the chaining value, is initialized to a predeﬁned initial
`value CV0 = IV . Each message block is then fed to a compression function h
`that updates the chaining value, i.e. CVj+1 = h(CVj, Mj+1), for 0 ≤ j < k,
`where the ﬁnal CVk is output as the hash.
`The compression function h takes a 160-bit chaining value CVj and a 512-
`bit message block Mj+1 as inputs, and outputs a new 160-bit chaining value
`CVj+1. It mixes the message block into the chaining value as follows, operating
`on words, simultaneously seen as 32-bit strings and as elements of Z/232Z: the
`input chaining value is parsed as ﬁve words a, b, c, d, e, and the message block as
`16 words m0, . . . , m15. The latter are expanded into 80 words using the following
`recursive linear equation:
`for 16 ≤ i <80.
`mi = (mi−3 ⊕ mi−8 ⊕ mi−14 ⊕ mi−16)(cid:2)1,
`Starting from (A−4, A−3, A−2, A−1, A0) := (e(cid:2)2, d(cid:2)2, c(cid:2)2, b, a), each mi is mixed
`into an intermediate state over 80 steps i = 0, . . . , 79:
`, A(cid:3)2i−3) + A(cid:3)2i−4 + Ki + mi,
`
`
`Ai+1 = A(cid:2)5i + ϕi(Ai−1, A(cid:3)2
`
`i−2
`where ϕi and Ki are predeﬁned Boolean functions and constants:
`
`ϕi(x, y, z)
`Step i
`ϕIF = (x ∧ y) ∨ (¬x ∧ z)
`0 ≤ i <20
`20 ≤ i <40 ϕXOR = x ⊕ y ⊕ z
`40 ≤ i <60 ϕMAJ = (x ∧ y) ∨ (x ∧ z) ∨ (y ∧ z) 0x8f1bbcdc
`60 ≤ i <80 ϕXOR = x ⊕ y ⊕ z
`
`Ki
`
`0x5a827999
`
`0x6ed9eba1
`
`0xca62c1d6
`
`After the 80 steps, the new chaining value is computed as the sum of the
`input chaining value and the ﬁnal intermediate state:
`, d + A(cid:3)2
`CVj+1 = (a + A80, b + A79, c + A(cid:3)2
`
`, e + A(cid:3)2
`76 ).
`
`77
`
`78
`
`4 Overview of our SHA-1 Collision Attack
`
`We illustrate our attack from a high level in Fig. 1. Starting from identical chain-
`ing values for two messages, we use two pairs of blocks. The diﬀerences in the
`ﬁrst block pair cause a small diﬀerence in the output chaining value, which is
`canceled by the diﬀerence in the second block pair, leading again to identical
`chaining values and hence a collision (indicated by (2)). We employ diﬀerential
`paths that are a precise description of diﬀerences in state words and message
`words and of how these diﬀerences should propagate through the 80 steps.
`Note that although the ﬁrst ﬁve state words are ﬁxed by the chaining value,
`one can freely modify message words and thus directly inﬂuence the next sixteen
`
`Page 6 of 27
`
`

`576
`
`M. Stevens et al.
`
`Fig. 1. Attack overview
`
`state words. Moreover, with additional eﬀort this can be extended to obtain lim-
`ited inﬂuence over another eight state words. However, control over the remaining
`state words (indicated by (1)) is very hard and thus requires very sparse target
`diﬀerences that correctly propagate with probability as high as possible. Fur-
`thermore, these need to be compatible with diﬀerences in the expanded message
`words. The key solution is the concept of local collisions [5], where any state bit-
`diﬀerence introduced by a perturbation message bit-diﬀerence is to be canceled
`in the next ﬁve steps using correction message bit-diﬀerences.
`To ensure all message word bit diﬀerences are compatible with the linear
`message expansion, one uses a disturbance vector (DV) [5] that is a correctly
`expanded message itself, but where every “1” bit marks the start of a local
`collision. The selection of a good disturbance vector has a very high impact on
`the overall attack cost. As previously shown by Wang et al. [48], the main reason
`of using two block pairs (i.e. to search for a near-collision over a ﬁrst message
`block, that is completed to a full collision over a second) instead of only one
`is that this choice alleviates an important restriction on the disturbance vector,
`namely that there are no state diﬀerences after the last step. Similarly, it may be
`impossible to unite the input chaining value diﬀerence with the local collisions for
`an arbitrary disturbance vector. This was solved by Wang et al. [48] by crafting a
`tailored diﬀerential path (called the non-linear (NL) path, indicated by (3)) that
`over the ﬁrst 16 steps connects the input chaining value diﬀerences to the local
`collision diﬀerences over the remaining steps (called the linear path, referring to
`the linear message expansion dictating the local collision positions).
`One has to choose a good disturbance vector, then craft a non-linear diﬀer-
`ential path for each of the two near-collision attacks (over the ﬁrst and second
`message blocks), determine a system of equations over all steps and ﬁnally ﬁnd
`a solution in the form of a message block pair (as indicated by (4A) and (4B)).
`Note that one can only craft the non-linear path for the second near-collision
`attack once the chaining values resulting from the ﬁrst block pair are known.
`This entire process including our improvements is described below.
`
`5 Near-Collision Attack Procedure
`
`This section describes the overall procedure of each of the two near-collision
`attacks. Since we relied on our modiﬁcation of Stevens’ public source-code [17,43]
`
`Page 7 of 27
`
`

`The First Collision for Full SHA-1
`
`577
`
`DV selection
`
`Craft non-
`linear path
`
`Run attack
`
`Write attack
`algorithm
`
`Determine
`attack
`conditions
`
`Find
`speed-ups
`(boomerangs)
`
`Find
`additional
`conditions
`
`Fix
`solvability
`ﬁrst steps
`
`Fig. 2. The main steps for each near-collision attack.
`
`for the ﬁrst near-collision attack, we focus on our extended procedure for our
`second near-collision attack. As shown in Fig. 2, this involves the following steps
`that are further detailed below:
`
`1. selection of the disturbance vector (same for both attacks);
`2. construction of the non-linear diﬀerential path;
`3. determine attack conditions over all steps;
`4. ﬁnd additional conditions beyond the ﬁxed diﬀerential path for early-stop;
`5. if necessary ﬁx solvability of attack conditions over the ﬁrst few steps;
`6. ﬁnd message modiﬁcation rules to speed-up collision search;
`7. write the attack algorithm;
`8. ﬁnally, run the attack to ﬁnd a near-collision block pair.
`
`5.1 Disturbance Vector Selection
`
`The selection of which disturbance vector to use is a major choice, as it directly
`determines many aspects of the collision attack. These include the message XOR
`diﬀerences, but also in theory the optimal attack choices over the linear path,
`including the optimal set of candidate endings for the non-linear path together
`with optimal linear message-bit equations that maximize the success probability
`over the linear part.
`Historically several approaches have been used to analyze a disturbance vec-
`tor to estimate attack costs over the linear part. Initially, the Hamming weight of
`the DV that counts the active number of local collisions was used (see e.g. [4,35]).
`For the ﬁrst theoretical attack on SHA-1 with cost 269 SHA-1-calls by Wang
`et al. [48] a more reﬁned measure was used, that counts the number of bit-
`conditions on the state and message bits that ensure that the diﬀerential path
`would be followed. This was later reﬁned by Yajima et al. [51] to a more pre-
`cise count by exploiting all possible so-called bit compressions and interactions
`through the Boolean functions. However, this approach does not allow any dif-
`ference in the carry propagation, which otherwise could result in alternate diﬀer-
`ential paths that may improve the overall success probability. Therefore, Mendel
`et al. [28] proposed to use the more accurate probability of single local collisions
`where carry propagations are allowed, in combination with known local collision
`interaction corrections.
`
`Page 8 of 27
`
`

`578
`
`M. Stevens et al.
`
`The current state-of-the-art is joint-local-collision analysis (JLCA) intro-
`duced by Stevens [41,43] which given sets of allowed diﬀerences for each state
`word Ai and message word mi (given by the disturbance vector) computes the
`exact optimal success probability over the speciﬁed steps by exhaustively evalu-
`ating all diﬀerential paths with those allowed diﬀerences. This approach is very
`powerful as it also provides important information for the next steps, namely
`the set of optimal chaining value diﬀerences (by considering arbitrary high prob-
`ability diﬀerences for the last ﬁve Ais) and the set of optimal endings for the
`non-linear path, together with a corresponding set of message-bit equations,
`using which the optimal highest success probability of the speciﬁed steps can
`actually be achieved. The best theoretical collision attack on SHA-1 with cost
`261 SHA-1 calls [43] was built using this analysis. As we build upon this collision
`attack, we use the same disturbance vector, named II(52, 0) by Manuel [26] and
`originally described by Jutla and Patthak [20].
`
`5.2 Construction of a Non-linear Diﬀerential Path
`
`Once the disturbance vector and the corresponding linear part of the diﬀerential
`path have been ﬁxed, the next step consists in ﬁnding a suitable non-linear path
`connecting the chaining value pair (with ﬁxed diﬀerences) to the linear part.
`This step needs to be done separately for each near-collision attack of the full
`collision attack2.
`As explained for instance in [43], in the case of the ﬁrst near-collision attack,
`the attacker has the advantage of two additional freedoms. Firstly, an arbitrary
`preﬁx can be included before the start of the attack to pre-fulﬁll a limited number
`of conditions on the chaining value. This allows greater freedom in constructing
`the non-linear path as this does not have to be restricted to a speciﬁc value
`of the chaining value pair, whereas the non-linear path for the second near-
`collision attack has to start from the speciﬁc given value of input chaining value
`pair. Secondly, it can use the entire set of output chaining value diﬀerences with
`the same highest probability. The ﬁrst near-collision attack is not limited to a
`particular value and succeeds when it ﬁnds a chaining value diﬀerence in this set,
`whereas the second near-collision attack has to cancel the speciﬁc diﬀerence in
`the resulting chaining value pair. Theory predicts the ﬁrst near-collision attack
`to be at least a factor six faster than the second attack [43]. For our collision
`attack it is indeed the second near-collision attack that dominates the overall
`attack complexity.
`Historically, the ﬁrst non-linear paths for SHA-1 were hand-crafted by Wang
`et al. Several algorithms were subsequently developed to automatically search
`for non-linear paths for MD5, SHA-1, and other functions of the MD-SHA family.
`The ﬁrst automatic search for SHA-1 by De Canni`ere and Rechberger [8] was
`based on a guess-and-determine approach. This approach tracks the allowed
`
`2 We eventually produced two message block pair solutions for the ﬁrst near-collision
`attack. This provided a small additional amount of freedom in the search for the
`non-linear path of the second block.
`
`Page 9 of 27
`
`

`The First Collision for Full SHA-1
`
`579
`
`values of each bit pair in the two related compression function computations.
`It starts with no constraints on the values of these bit pairs other than the
`chaining value pair and the linear part diﬀerences. It then repeatedly restricts
`values on a selected bit pair and then propagates this information via the step
`function and linear message expansion relation, i.e., it determines and eliminates
`previously-allowed values for other bit pairs that are now impossible due the
`added restriction. Whenever a contradiction occurs, the algorithm backtracks
`and chooses a diﬀerent restriction on the last selected bit pair.
`Another algorithm for SHA-1 was introduced by Yajima et al. [52] that is
`based on a meet-in-the-middle approach. It starts from two fully-speciﬁed diﬀer-
`ential paths; the ﬁrst is obtained from a forward expansion of the input chaining
`value pair, whereas the other is obtained from a backward expansion of the linear
`path. It then tries to connect these two diﬀerential paths over the remaining ﬁve
`steps in the middle by recursively iterating over all solutions over a particular
`step.
`A similar meet-in-the-middle algorithm was independently ﬁrst developed for
`MD5 and then adapted to SHA-1 by Stevens et al. [17,41,45], which operates
`on bit-slices and is more eﬃcient. The open-source HashClash project [17] seems
`to be the only publicly available non-linear path construction implementation,
`which we improved as follows. Originally, it expanded a large set of diﬀerential
`paths step by step, keeping only the best N paths after each step, for some user-
`speciﬁed number N. However, there might be several good diﬀerential paths
`that result in the same diﬀerences and conditions around the connecting ﬁve
`steps, where either none or all lead to fully-connected diﬀerential paths. Since
`we only need the best fully-connected diﬀerential path we can ﬁnd, we only need
`to keep a best diﬀerential path from each subset of paths with the same diﬀer-
`ences and conditions over the last ﬁve steps that were extended. So to remove
`this redundancy, for each step we extend and keep, say, the 4N best paths, then
`we remove all such superﬂuous paths, and ﬁnally keep at most N paths. This
`improvement led to a small but very welcome reduction in the amount of diﬀer-
`ential path conditions under the same path construction parameter choices, but
`also allowed a better positioning of the largest density of suﬃcient conditions
`for the diﬀerential path.
`Construction of a very good non-linear path for the second near-collision
`attack using our improved HashClash version took a small eﬀort with our
`improvements, yet even allowed us to restrict the section with high density of
`conditions to just the ﬁrst six steps. However, to ﬁnd a very good non-linear
`diﬀerential path that is also solvable turned out to be more complicated. Our
`ﬁnal solution is described in Sect. 5.5, which in the end did allow us to build
`our attack on the best non-linear path we found without any compromises. The
`ﬁxed version of this best non-linear path is presented in Fig. 3, Sect. A.
`
`5.3 Determine Attack Conditions
`
`Having selected the disturbance vector and constructed a non-linear path that
`bridges into the linear part, the next step is to determine the entire system of
`
`Page 10 of 27
`
`

`580
`
`M. Stevens et al.
`
`equations for the attack. This system of equations is expressed entirely over the
`computation of message M (1), and not over M (2), and consists of two types of
`equations:
`
`1. Linear equations over message bits. These are used to control the additive
`signs of the message word XOR diﬀerences implied by the disturbance vector.
`Since there are many diﬀerent “signings” over the linear part with the same
`highest probability, instead of one speciﬁc choice one uses a linear hull that
`captures many choices to reduce the amount of necessary equations.
`2. Linear equations over state bits given by a ﬁxed diﬀerential path up to some
`step i (that includes the non-linear path). These control whether there is a
`diﬀerence in a state bit and which sign it has, furthermore they force target
`diﬀerences in the outputs of the Boolean functions ϕi.
`
`We determine this entire system by employing our implementation of joint-
`local-collision analysis that has been improved as follows. JLCA takes input sets
`of allowed diﬀerences for each Ai and mi and exhaustively analyzes the set of
`diﬀerential paths with those allowed diﬀerences, which originally is only used to
`analyze the linear part. We additionally provide it with speciﬁc diﬀerences for
`Ai and mi as given by the non-linear path, so we can run JLCA over all 80 steps
`and have it output an optimal ﬁxed diﬀerential path over steps 0, . . . , 22 together
`with an optimal set of linear equations over message bits over the remaining
`steps. These are optimal results since JLCA guarantees these lead to the highest
`probability that is possible using the given allowed diﬀerences, but furthermore
`that a largest linear hull is used to minimize the amount of equations.
`Note that having a ﬁxed diﬀerential path over more steps directly provides
`more state bit equations which is helpful in the actual collision search because we
`can apply an early-stop technique. However, this also adds further restrictions on
`Ai limiting a set of allowed diﬀerences to a single speciﬁc diﬀerence. In our case
`limiting A24 would result, besides a drop in degrees of freedom, in a lower overall
`probability, thus we only use a ﬁxed diﬀerential path up to step 22, i.e., up to
`A23. Below in Sect. 5.4 we show how we compensated for fewer state equations
`that the actual collision search uses to early stop.
`
`5.4 Find Additional State Conditions
`
`As explained in Sect. 5.3, the system of equations consists of linear equations
`over (expanded) message bits and linear equations over state bits. In the actual
`collision search algorithm, we depend on these state bit equations to stop com-
`putation on a bad current solution as early as possible and start backtracking.
`These state bit equations are directly given by a ﬁxed diﬀerential path, where
`every bit diﬀerence in the state and message is ﬁxed. Starting from step 23 we
`allow several alternate diﬀerential paths that increase success probability, but
`also allow distinct message word diﬀerences that lead to a decrease in the overall
`number of equations. Each alternate diﬀerential path depends on its own (dis-
`tinct) message word diﬀerences and leads to its own state bit equations. To ﬁnd
`
`Page 11 of 27
`
`

`The First Collision for Full SHA-1
`
`581
`
`additional equations, we also consider linear equations over state and message
`bits around steps 21–25. Although in theory these could be computed by JLCA
`by exhaustively reconstructing all alternate diﬀerential paths and then deter-
`mining the desired linear equations, we instead took a much simpler approach.
`We generated a large amount of random solutions of the system of equations
`up to step 31 using an unoptimized general collision search algorithm. We then
`proceeded to exhaustively test potential linear equations over at most four state
`bits and message bits around steps 21–25, which is quite eﬃcient as on average
`

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases