`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`IN THE UNITED STATES PATENT AND TRADEMARK OFFICE
`
`FILED VIA THE PATENT REVIEW PROCESSING SYSTEM
`
`In re Inter Partes Review of:
`U.S. Patent No. 6,757,717
`
`Issued: June 29, 2004
`
`Applicant: Leonid Goldstein
`
`Application No. 09/398,007
`
`Filed: September 16, 1999
`
`Title: System And Method For Data
`
`Access
`
`Currently in Litigation Styled:
`Proxyconn Inc. v. Microsoft
`Corporation, et al., Central District
`of California, Case No. SA CV11-
`1681 DOC (ANx) [Consolidated
`With Case Nos. SA CV11-1682
`DOC (ANx), SA CV11-1683 DOC
`(ANx), and SA CV11-1684 DOC
`(ANx)]
`
`
`
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`)
`
`Declaration of Professor Darrell D. E. Long
`Regarding U.S. Patent No. 6,757,717
`
`I.
`
`QUALIFICATIONS
`
`I am a Professor of Computer Science and have served as Associate
`
`Dean for Research and Graduate Studies in the Jack Baskin School of
`
`Engineering at the University of California at Santa Cruz. I hold the Kumar
`
`Malavalli Endowed Chair of Storage Systems Research and I am the Director
`
`1
`
`MICROSOFT
`
`EXHIBIT 1007
`
`
`
`
`
`
`
`1
`
`of the Storage Systems Research Center, an internationally recognized center
`
`2
`
`of excellence in data storage. I am also the Director of the Working-group on
`
`3
`
`Applied Security and Privacy (WASP), the laboratory at the University of
`
`4
`
`California at Santa Cruz that studies computer security. I teach graduate and
`
`5
`
`undergraduate courses in computer security, operating systems, data storage
`
`6
`
`and have taught courses in networking and distributed systems. I received my
`
`7
`
`B.S. degree in Computer Science from San Diego State University, and my
`
`8
`
`M.S. and Ph.D. from the University of California, San Diego. I am a Fellow
`
`9
`
`of the Institute of Electrical and Electronics Engineers and of the American
`
`10
`
`Association for the Advancement of Science. My research interests include
`
`11
`
`data storage systems, operating systems, computer security, distributed
`
`12
`
`systems and networking. My qualifications are further described in my
`
`13
`
`appended Curriculum Vitae.
`
`14
`
`
`
`I have published numerous papers including in the ACM Transactions
`
`15
`
`on Storage, and various IEEE journals, and I am the co-author of two books.
`
`16
`
`These publications are listed in Exhibit A. I am the founder of the premier
`
`17
`
`conference in the data storage field known as the Symposium on File Storage
`
`18
`
`Technologies (“FAST”). I have participated in organizing numerous
`
`19
`
`academic conferences including:
`
`20
`
`2
`
`
`
`
`
`
`
`1
`
`2
`
`2012:
`
`Steering Committee: Petascale Data Storage Workshop (PDSW),
`
`3
`
`Symposium on Modeling, Analysis and Simulation of Computer and
`
`4
`
`Telecommunication Systems (MASCOTS), Symposium on File and Storage
`
`5
`
`Systems Technology (FAST).
`
`6
`
`Program Committee: Symposium on File and Storage Systems
`
`7
`
`Technology (FAST).
`
`8
`
`9
`
`2011:
`
`Steering Committee: Petascale Data Storage Workshop (PDSW),
`
`10
`
`Symposium on Modeling, Analysis and Simulation of Computer and
`
`11
`
`Telecommunication Systems (MASCOTS), Symposium on File and Storage
`
`12
`
`Systems Technology (FAST).
`
`13
`
`Program Committee: Symposium on Modeling, Analysis and
`
`14
`
`Simulation of Computer and Telecommunication Systems (MASCOTS).
`
`15
`
`16
`
`2010:
`
`Program Chair: Symposium on Modeling, Analysis and Simulation of
`
`17
`
`Computer and Telecommunication Systems (MASCOTS).
`
`18
`
`Steering Committee: Petascale Data Storage Workshop (PDSW),
`
`19
`
`Symposium on Modeling, Analysis and Simulation of Computer and
`
`20
`
`3
`
`
`
`
`
`
`
`1
`
`Telecommunication Systems (MASCOTS), Symposium on File and Storage
`
`2
`
`Systems Technology (FAST).
`
`3
`
`4
`
`2009:
`
`Program Committee: International Workshop on Software Support for
`
`5
`
`Portable Storage (IWSSPS), Inaugural International Conference on
`
`6
`
`Virtualization and Cloud Computing, Symposium on Modeling, Analysis and
`
`7
`
`Simulation of Computer and Telecommunication Systems (MASCOTS),
`
`8
`
`Petascale Data Storage Workshop (PDSW).
`
`9
`
`10
`
`11
`
`Program Chair: Web Information Systems Engineering (WISE).
`
`General Chair: Symposium on Applications and the Internet (SAINT).
`
`Steering Committee: Symposium on Modeling, Analysis and
`
`12
`
`Simulation of Computer and Telecommunication Systems (MASCOTS),
`
`13
`
`Symposium on File and Storage Systems Technology (FAST).
`
`14
`
`I have also consulted for industry in the area of storage systems
`
`15
`
`including for Hewlett-Packard Laboratories and IBM. I have also been a
`
`16
`
`consultant to numerous government agencies.
`
`17
`
`II. COMPENSATION
`
`18
`
`I am being compensated by counsel for Microsoft at my compensation
`
`19
`
`rate of $500/hour for consulting and $600/hour for testimony in deposition or
`
`20
`
`4
`
`
`
`
`
`
`
`1
`
`trial, plus reimbursement for reasonably incurred expenses. I have no interest
`
`2
`
`in the outcome of the related litigation or this proceeding.
`
`3
`
`4
`
`III. SUMMARY OF MY STUDY AND CONCLUSIONS
`
`I have read U.S. Patent No. 6,757,717. The patent concerns
`
`5
`
`technology within my areas of expertise. I have considered the patent’s
`
`6
`
`disclosures from the perspective of a person of ordinary skill in the art in
`
`7
`
`1998-99.
`
`8
`
`I have studied the following references and considered them from the
`
`9
`
`perspective of the person of ordinary skill in the art in 1998-99.
`
`10
`
`Perlman: Radia J. Perlman et al., U.S. Patent No. 5,742,820,
`
`11
`
`“Mechanism for Efficiently Synchronizing Information Over a Network,”
`
`12
`
`issued Apr. 21, 1998 (“’820” or “Perlman”).
`
`13
`
`Yohe: Thomas Patrick Yohe et al., U.S. Patent No. 5,835,943,
`
`14
`
`“Apparatus and Method for Increased Data Access in a Network File
`
`15
`
`Oriented Caching System,” issued Nov. 10, 1998 on an application filed July
`
`16
`
`3, 1997 (“’943” or “Yohe”).
`
`17
`
`Santos: Jonathan Santos et al., “USENIX, Increasing Effective Link
`
`18
`
`Bandwidth by Suppressing Replicated Date,” Proceedings of the USENIX
`
`19
`
`Annual Technical Conference (NO 98) New Orleans, Louisiana, June 1998
`
`20
`
`(“Santos”).
`
`5
`
`
`
`
`
`
`
`1
`
`I have compared these references to claims 1, 3, 10-12, 14 and 22-24
`
`2
`
`of the ’717 patent. I also have reviewed Proxyconn’s arguments (in an
`
`3
`
`“interrogatory” response) attempting to distinguish the claims from these
`
`4
`
`references.
`
`5
`
`I have considered the perspective of the person of ordinary skill in the
`
`6
`
`art in 1998-99 (defined below) who was designing a system in which
`
`7
`
`dynamic data is sent over a network from some source and is stored by a
`
`8
`
`receiver computer for possible later reuse, and having a design goal of
`
`9
`
`reducing the transmission of redundant data over the network. I have
`
`10
`
`considered such a person having read Perlman with an eye toward using and
`
`11
`
`possibly expanding on its teachings. I have also considered such a person
`
`12
`
`having read Yohe and looking to use and possibly expand on its teachings.
`
`13
`
`These nine ’717 patent claims recite nothing innovative compared to
`
`14
`
`these references. Perlman discloses to the person of ordinary skill in the art
`
`15
`
`in 1998-99 everything required by these claims. Yohe likewise discloses
`
`16
`
`everything, except for bundling multiple digital fingerprints in the same
`
`17
`
`message, which Perlman does disclose. Also, the natural combination of
`
`18
`
`Perlman and Yohe taught the person of skill in the art the combinations
`
`19
`
`claimed in these claims. In other words, the person of skill in the art would
`
`20
`
`have already possessed the claimed subject matter upon reading Perlman and
`
`6
`
`
`
`
`
`
`
`1
`
`Yohe, combined with their knowledge of the conventional technology in the
`
`2
`
`field, as explained below.
`
`3
`
`Also, the patent is internally inconsistent and unclear on the meaning
`
`4
`
`of “digital digest,” as explained below. For purposes of my comparison of
`
`5
`
`the claims to the prior art references, however, I have assumed that this term
`
`6
`
`includes a fixed-size digital fingerprint (e.g., hash, message digest, signature
`
`7
`
`or identifier), of 32 or more bits, calculated using an MD5 and/or CRC
`
`8
`
`algorithm and calculated on arbitrary-size data, such that it represents and
`
`9
`
`depends only on the content of that data.
`
`10
`
`IV. FIELD OF THE INVENTION
`
`11
`
`The ’717 patent defines its “field of the invention” as accessing data in
`
`12
`
`communication networks. (’717, 1:10-15). The field also includes the areas
`
`13
`
`of distributed data storage systems and networking, coding theory including
`
`14
`
`error detection and correction codes, and cryptographic hash functions
`
`15
`
`commonly called message digest functions. These were all mature fields in
`
`16
`
`1998-99.
`
`17
`
`V. LEVEL OF SKILL IN THE ART IN 1998-99
`
`18
`
`A person of ordinary skill in this art in 1998-99 would hold a B.S.
`
`19
`
`degree in computer science and would have as part of his study courses in
`
`20
`
`operating systems, networking, data compression and computer security.
`
`7
`
`
`
`
`
`
`
`1
`
`These studies would include the storage subsystem of computer operating
`
`2
`
`systems which is covered briefly in most undergraduate operating systems
`
`3
`
`courses, but few require the student to examine actual source code. In
`
`4
`
`addition he would have several years of practical experience working in
`
`5
`
`operating systems, in particular the data storage subsystem.
`
`6
`
`As a result, actual experience in working with this operating system
`
`7
`
`subsystem would normally occur after several years of experience working
`
`8
`
`for a company with a focus on systems software.
`
`9
`
`Alternatively, a person would develop the level of ordinary skill in the
`
`10
`
`art in 1998-99 by obtaining an M.S. in computer science and by writing his or
`
`11
`
`her thesis in an area related to data storage and/or computer security.
`
`12
`
`A person of ordinary skill in the art would understand network
`
`13
`
`protocols. This was normally part of undergraduate programs in computer
`
`14
`
`science in 1998-99. A person of ordinary skill in the art would also
`
`15
`
`understand coding theory; in particular error detection and correction codes,
`
`16
`
`as well as cryptographic hash functions and message digest functions.
`
`17
`
`Introduction to basic hash functions is a normal part of most undergraduate
`
`18
`
`curricula, but coding theory is normally part of specialized courses (although
`
`19
`
`it is commonly part of electrical engineering programs), and cryptographic
`
`20
`
`8
`
`
`
`
`
`
`
`1
`
`hash functions would normally be taught only in courses in computer
`
`2
`
`security.
`
`3
`
`I have first-hand experience teaching and working with such persons of
`
`4
`
`ordinary skill in the art. For example, I have taught students having about that
`
`5
`
`level of skill in this art since at least as early as 1990.
`
`VI. PERLMAN AND YOHE TEACH
`
`THE CLAIMED SUBJECT MATTER
`
`A.
`
`Perlman and Yohe Are A Natural Combination
`
`There are multiple reasons why a PHOSITA looking to use Perlman
`
`would have looked to the teachings of Yohe and, conversely, one looking to
`
`use Yohe would have looked to the teachings of Perlman.
`
`Same Problem: First, each reference addresses the same specific
`
`communications problem addressed in the ’717 patent: the desire to reduce
`
`redundancy in network data transmissions where dynamic data previously
`
`sent over the network has been stored by the receiver for possible later reuse.
`
`Each addresses how to validate that stored (cached) copy to confirm that it
`
`remains synchronized with the source data at the sender. “Another object of
`
`the present invention is to decrease data traffic throughout the network.”
`
`(’717, 1:61-65). “[I]t is among the objects of the present invention to reduce
`
`the bandwidth consumed by transmission of database summary information
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`9
`
`
`
`
`
`
`
`1
`
`packets over a computer network.” (’820, 3:52-55). “If a copy of the data
`
`2
`
`can be stored in the permanent storage memory of the remote client computer
`
`3
`
`and also verified to be current when it is subsequently retrieved, this will
`
`4
`
`improve performance significantly.” (’943, 4:32-40). This common problem
`
`5
`
`would have motivated a PHOSITA to consider these references together.
`
`6
`
`Same Algorithm: Second, each reference proposes the same
`
`7
`
`algorithmic solution to this common problem. Each uses the identity-check
`
`8
`
`property of message digest fingerprints or CRC values. Each transmits those
`
`9
`
`fingerprints from sender to receiver for the receiver to compare to its own
`
`10
`
`digital fingerprint values. Each returns a negative indication from receiver to
`
`11
`
`sender if there is no match, upon which the sender returns the underlying
`
`12
`
`data. This common algorithmic solution would have motivated a PHOSITA
`
`13
`
`to consider these references together.
`
`14
`
`Overlapping Applications: Third, each reference teaches that its
`
`15
`
`techniques can be used with network routers. Yohe discloses that its cache
`
`16
`
`verifying computer and communications server may be formed integrally
`
`17
`
`together (’943, claim 7) and that its communication server could be a Cisco
`
`18
`
`or other router (id., 4:52-56). Perlman’s illustrative embodiment is a router.
`
`19
`
`This overlap of a common application would have motivated a PHOSITA to
`
`20
`
`consider these references together.
`
`10
`
`
`
`
`
`
`
`1
`
`Perlman’s Express Direction: Fourth, Perlman expressly discloses and
`
`2
`
`claims that its mechanisms are not limited to routers but apply to any network
`
`3
`
`nodes (which includes Perlman’s source nodes and destination nodes (’820,
`
`4
`
`5:41-46, Fig. 2)) needing such synchronization between their database
`
`5
`
`contents. (’820, Abstract, 1:1-9, 3:61-4:4, 7:12-22, 8:52-9:2, claims 1, 4-6, 8,
`
`6
`
`10.) This broad teaching would have motivated a PHOSITA to consider
`
`7
`
`these references together.
`
`8
`
`Nothing Incompatible: Finally, nothing in either of these references
`
`9
`
`teaches away from their combination. In particular, as explained below, the
`
`10
`
`few implementation differences between these references are mere design
`
`11
`
`choices arising from the particular applications being illustrated. There is no
`
`12
`
`essential feature in Perlman in conflict with an essential feature of Yohe.
`
`B.
`
`
`The Designer Naturally Would Have
`Combined Perlman And Yohe In The Claimed Manner
`
`Perlman discloses everything recited in ’717 patent claims 1, 3, 10-12
`
`and 22-24. Perlman discloses the only element in any challenged claim
`
`arguably missing from Yohe (bundling plural “digital digests” in the same
`
`message, per claim 24). Yohe discloses the only element in any challenged
`
`claim missing from Perlman’s illustrative embodiment (searching for “digital
`
`digests” in the receiver’s permanent memory, per claim 23). As noted, this
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`11
`
`
`
`
`
`
`
`1
`
`permanent-memory element also is disclosed in Perlman’s broader
`
`2
`
`disclosure. I discuss below four claim features that Proxyconn might argue
`
`3
`
`would be missing from the combination of Perlman and Yohe.
`
`4
`
`5
`
`Use of Receiver’s Permanent Memory:
`
`Perlman discloses that the receiver computer has a hard disk or the like
`
`6
`
`for persistent storage of data, which the ’717 patent refers to as permanent
`
`7
`
`memory. Specifically, Perlman discloses that its illustrative-embodiment
`
`8
`
`router is a “general-purpose” computer with an operating system, only
`
`9
`
`“portions of which” are resident in RAM (’820, 5:41-52, Fig. 2). A person of
`
`10
`
`ordinary skill in the art in 1998-99 would have understood this term “general-
`
`11
`
`purpose computer” to refer to a computer having, among other things, a hard
`
`12
`
`disk or the like for storing persistently data and application programs. Any
`
`13
`
`“general-purpose computer” in 1998-99 was capable of storing data (e.g., a
`
`14
`
`digital fingerprint) in its permanent memory, e.g., as part of its operating
`
`15
`
`system or an application program or data files, etc. And, the person of
`
`16
`
`ordinary skill would have understood that the portion of the operating system
`
`17
`
`in Perlman not resident in RAM would necessarily have been stored on disk.
`
`18
`
`Further, Perlman expressly discloses that its mechanism applies to
`
`19
`
`“any type” of distributed network system needing to synchronize the database
`
`20
`
`on one node with that of another. (’820, 8:52-9:2). The person of skill in the
`
`12
`
`
`
`
`
`
`
`1
`
`art in 1998-99 would have understood that this “any type” includes the Yohe
`
`2
`
`system and other such systems whose computers have a hard disk or other
`
`3
`
`persistent storage.
`
`4
`
`Perlman’s illustrative-embodiment general-purpose computer is
`
`5
`
`running a link state routing protocol requiring rapid propagation of routing
`
`6
`
`table updates, and thus naturally stores its digital fingerprints in RAM, not in
`
`7
`
`the receiver’s permanent memory (see ’717, claim 10), and searches for data
`
`8
`
`with a particular digital fingerprint in the receiver’s RAM not its permanent
`
`9
`
`memory (see ’717, claim 23). But, as discussed, Perlman expressly discloses
`
`10
`
`that its mechanism is not restricted to this network routing application but
`
`11
`
`instead applies to “any type” of distributed system requiring synchronization
`
`12
`
`of databases on nodes of a network. This “any type” includes systems using
`
`13
`
`a cache stored on disk or other permanent-storage device.
`
`14
`
`While Perlman’s illustrative router embodiment stores its cached data
`
`15
`
`in RAM, Yohe’s receiver stores its cache data in part on disk. This difference
`
`16
`
`naturally follows from their different illustrative applications. A skilled
`
`17
`
`designer knew in 1998-99 that a cache could be stored in volatile (non-
`
`18
`
`persistent) memory, or persistent memory, or in both types of memory.
`
`19
`
`Those are the only choices. Often, a general-purpose computer had both
`
`20
`
`persistent and non-persistent memory, as in Yohe, for example. Each choice
`
`13
`
`
`
`
`
`
`
`1
`
`has tradeoffs versus the other, and the particular application typically
`
`2
`
`recommends use of one or the other, or both. For example, read/write speed
`
`3
`
`is of utmost importance in the routing table synchronization application of
`
`4
`
`Perlman’s illustrative embodiment, not storage capacity, and routers typically
`
`5
`
`are rebooted only rarely, so its cache naturally was in volatile RAM. On the
`
`6
`
`other hand, Yohe’s illustrative application of synchronizing entire file
`
`7
`
`systems (on client computers that might be rebooted frequently),
`
`8
`
`recommended the greater storage capacity and persistence of storing the
`
`9
`
`cache on disk not solely in RAM.
`
`10
`
`As noted, Perlman expressly discloses that its mechanism is not limited
`
`11
`
`to routers but instead applies to any network node in a distributed system that
`
`12
`
`needs to synchronize databases on the various nodes. (’820, 8:52-9:2).
`
`13
`
`(Perlman uses “database” in a broad sense. The illustrative “database” in
`
`14
`
`Perlman is a routing table.) Consistent with this teaching, at least five claims
`
`15
`
`of Perlman (claims 1 and 4-7) broadly apply to any nodes of a computer
`
`16
`
`network. Therefore, it not only would be natural and commonsense to use
`
`17
`
`Perlman’s mechanisms with nodes using a persistent-memory (as in Yohe),
`
`18
`
`Perlman itself teaches doing so.
`
`19
`
`20
`
`This conclusion that a person of skill in the art would naturally have
`
`applied Perlman’s teachings to Yohe and similar applications using persistent
`
`14
`
`
`
`
`
`
`
`1
`
`memory for cache, is bolstered by the ’717 patent itself. The patent notes that
`
`2
`
`the receiver uses its permanent memory or its first memory or both for its
`
`3
`
`cache (’717, 7:27-29) and contains no disclosure of any particular functional
`
`4
`
`or structural differences or benefits between these options. In other words,
`
`5
`
`the ’717 patent does not suggest some unexpected results from using the
`
`6
`
`identity-check algorithm with data stored in a persistent cache vs. non-
`
`7
`
`persistent cache. The mathematical algorithms and network transmission
`
`8
`
`mechanisms are the same whether the cached data survives a reboot (i.e., is
`
`9
`
`on disk) or not (i.e., is only in RAM).
`
`10
`
`11
`
`Bundling Plural Fingerprints In Single Message:
`
`Perlman discloses bundling two or more levels of signatures in a single
`
`12
`
`message—e.g., a high-level signature of a data set and lower-level signatures
`
`13
`
`of fragments of the data set. This is done so that the receiver can check for
`
`14
`
`matches for each fragment and request only the unmatched fragments. (’820,
`
`15
`
`8:7-42, 8:52-9:2, Fig. 7). Yohe does not describe this element, but Perlman
`
`16
`
`does.
`
`17
`
`Just as Perlman saw a benefit of bundling multiple digital fingerprints
`
`18
`
`for multiple data items in the same message, so too would have the person
`
`19
`
`having ordinary skill in the art in 1998-99 starting with the Yohe design.
`
`20
`
`This would have been a natural and commonsense variation on Yohe, for a
`
`15
`
`
`
`
`
`
`
`1
`
`person having ordinary skill in the art in 1998-99, for at least the following
`
`2
`
`reasons.
`
`3
`
`First, both Yohe’s receiver and sender already have the capability of
`
`4
`
`performing this step of calculating and bundling multiple signatures together.
`
`5
`
`Specifically, each has a block signature generator capable of calculating such
`
`6
`
`fragment signatures. (’943, Fig. 2).
`
`7
`
`Second, Yohe’s sender already calculates CRC or MD5 signatures of
`
`8
`
`each block of a directory. (’943, Fig. 16).
`
`9
`
`Third, Yohe’s receiver has a comparator capable of comparing CRC or
`
`10
`
`MD5 signatures, whatever data may have been represented by those
`
`11
`
`signatures. (’943, Fig. 2). Thus, the only change to Yohe would be for its
`
`12
`
`sender to transmit the block signatures it calculates, together with the
`
`13
`
`directory signatures, something it was already capable of doing.
`
`14
`
`Fourth, the motivation and teaching to make this minor change is
`
`15
`
`expressed in Perlman, which teaches that sending multiple levels of
`
`16
`
`identifiers in the same message “conserves processing resources.” (’820,
`
`17
`
`8:43-51).
`
`18
`
`Fifth, as noted, Perlman expressly teaches that its mechanism may be
`
`19
`
`used in “any type” of distributed system requiring synchronization of
`
`20
`
`16
`
`
`
`
`
`
`
`1
`
`databases stored on the nodes of a network (’820, 8:52-9:2), which disclosure
`
`2
`
`encompasses Yohe’s distributed file system application.
`
`3
`
`Thus, a person having ordinary skill in the art in 1998-99 would have
`
`4
`
`combined this element of Perlman with Yohe in the way recited in the
`
`5
`
`challenged claims, would have had a reasonable expectation of success from
`
`6
`
`that combination, and that combination would have yielded the predictable
`
`7
`
`result explained in Perlman. For the above reasons, this combination of
`
`8
`
`Perlman and Yohe in this manner is no more than a predictable use of prior
`
`9
`
`art elements according to their established functions. A system designer of
`
`10
`
`ordinary skill wishing to improve the efficiency of network communications
`
`11
`
`would have seen a benefit to sending and comparing multiple levels of digital
`
`12
`
`fingerprints in the same message.
`
`13
`
`14
`
`Length of the Digital Fingerprint:
`
`Yohe’s illustrative embodiment uses 32-bit and 64-bit signatures. A
`
`15
`
`person of skill in the art, whose anticipated applications called for use of
`
`16
`
`longer fingerprints, would naturally have used a 128-bit fingerprint instead,
`
`17
`
`as in Perlman. First, Yohe itself states that a 128-bit MD5 signature could be
`
`18
`
`used. Second, if a particular application required a greater length of signature
`
`19
`
`to avoid collisions, then the skilled designer would have had every reason to
`
`20
`
`do that to avoid collisions, again as taught in Perlman and Yohe (and Santos).
`
`17
`
`
`
`
`
`
`
`1
`
`2
`
`Network Cache Memory:
`
`Although Perlman does not use the word “cache,” both Perlman and
`
`3
`
`Yohe disclose this element. Each discloses storing data in a logical part of
`
`4
`
`memory or storage storing copies of data received from the network, for
`
`5
`
`possible re-use. Further, a person of skill in the art would have been familiar
`
`6
`
`with the local and proxy caching prior art discussed in the ’717 patent (’717,
`
`7
`
`Figs. 1-2, 1:18-26), which required that the cached data be “verified to be
`
`8
`
`valid.” Perlman and Yohe each teaches a technique for verifying that saved
`
`9
`
`data previously received over the network is still valid, i.e, still the same as
`
`10
`
`the counterpart data at the data’s source. A skilled designer in 1998-99
`
`11
`
`looking to verify such data validity while reducing the amount of network
`
`12
`
`traffic naturally would have looked to Yohe and Perlman—each of which is
`
`13
`
`directed to the same design objective. This combination was particularly
`
`14
`
`natural given that some prior art proxy caches already used MD5 hashes on
`
`15
`
`the contents of a data object as its index for the cache’s contents, as disclosed
`
`16
`
`in, e.g., USP 6,128,623.
`
`17
`
`18
`
`19
`
`20
`
`18
`
`
`
`
`
`
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`VII. ’717 PATENT IS UNCLEAR ON
`
`MEANING OF “DIGITAL DIGEST”
`
`A. Hash Function And Message Digest
`
`A function is a computation that takes a value, say x, and maps it onto
`
`another value y. The value x is chosen from a set called the domain and y is
`
`chosen from a set called the range. A function may be one-to-one where for
`
`each y there is exactly one x, or many-to-one where many x1, x2, …, may all
`
`map to the same y. A hash function f is one that takes a key k, and maps it to
`
`exactly one of at most n values. That is, Ͳ ݂ሺ݇ሻ ൏ ݊.
`
`A message digest H is a form of one-way hash function that takes a
`
`message M of arbitrary length and produces a hash value of a fixed number of
`
`bits (“bit” stands for binary digit and has a value of 0 or 1.) If ܪሺܯሻ ൌ ݄
`then ݄ ൏ ʹ where m is the length in bits of the hash value of this hash
`
`function. Widely used message digest functions in 1998-99 included MD5,
`
`which produces 128 bits and SHA-1 that produces 160 bits. CRC-32 and
`
`CRC-64 can be thought of as particularly small message digest functions,
`
`although their original purpose was error detection and, as discussed below,
`
`they are not collision resistant.
`
`19
`
`
`
`
`
`
`
`1
`
`2
`
`B. Mathematical Properties Of A Message Digest
`
`Just as human fingerprints are expected to be unique to individuals, the
`
`3
`
`message digest of a given message (e.g., a string or block of data) ideally is
`
`4
`
`unique to that message. The ability of a hash function to produce a unique
`
`5
`
`fingerprint of the message depends in part on the number of bits that the
`
`6
`
`message digest produces and on the number of messages that may be hashed.
`
`7
`
`If the number of messages is sufficiently large, and the number of bits
`
`8
`
`produced by the message digest function insufficiently low, then a collision
`
`9
`
`will occur and multiple messages will produce the same hash value.
`
`10
`
`Message digest functions have the following additional properties that
`
`11
`
`make them one-way and thus desirable for a designed purpose (see Bruce
`
`12
`
`Schneier, Applied Cryptography (2nd ed. 1996) at p. 429 (Exhibit C)):
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`1.
`
`Given a message M, it is easy to compute h. The computing of a
`
`message digest must be efficient, since they are typically applied to large
`
`amounts of data.
`
`2.
`
`Given h, it is hard to compute M such that H(M) = h. It must be
`
`computationally hard, that is nothing short of exhaustive search, to find
`
`the original message M given h. In fact, since M contains more bits than
`
`h, information is lost and so there are many messages ܯԢ that hash to the
`
`same h.
`
`20
`
`
`
`
`
`
`
`1
`
`2
`
`3
`
`4
`
`5
`
`6
`
`7
`
`8
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`Given M, it is hard to find another ܯԢ such that ܪሺܯሻ ൌ
`3.
` ܪሺܯᇱሻ. Even though there are many ܯԢ that hash to the same h, m (the
`
`fixed length in bits of the hash value) should be sufficiently large that it is
`
`difficult to find such an ܯԢ and they are extraordinarily unlikely to occur
`
`accidentally.
`
`One of the uses of a message digest function is to determine if any bits
`
`in a message M have been changed. For a well-designed message digest, even
`
`changing a single bit will result in a vastly different value being returned by
`
`the message digest computation. It is not the case that two messages that are
`
`close (e.g., a small Hamming or Levenshtein (also called edit) distance apart)
`
`will have message digests that are also close. While a poorly designed hash
`
`function may result in clustering in the range, there were no known message
`
`digest functions in 1998-99 where two messages that are close in terms of
`
`edit distance will result in hash values that are close either numerically or in
`
`terms of Hamming distance. As a result, none of CRC-32, CRC-64, MD4,
`
`MD5, or any of the SHA family can be used to determine similarity between
`
`two messages. A well-designed hash function is flat, that is all values
`
`Ͳ ݄ ൏ ݊ are equally likely to occur, regardless of the distribution of input
`
`messages.
`
`21
`
`
`
`
`
`
`
`1
`
`2
`
`C. Collision Resistance
`
`One of the properties that typically we desire in a message digest
`
`3
`
`function is called collision resistance. A message digest function is collision
`
`4
`
`resistant if two unrelated messages are unlikely to result in the same hash
`
`5
`
`value. A requirement that the hash function produces a nearly flat distribution
`
`6
`
`is a requirement, but it also requires that the function produce a relatively
`
`7
`
`large number of bits. By modern standards, even MD5’s 128 bits are
`
`8
`
`considered insufficient to be deemed collision resistant. The 32 bits of CRC-
`
`9
`
`32 would not have been considered sufficient for collision resistance in 1998-
`
`10
`
`99 for practical data storage systems, nor would the 64 bits of CRC-64 except
`
`11
`
`for the smallest of data storage systems.
`
`12
`
`13
`
`D. Calculating Collision Probabilities
`
`Hash collisions often are the result of a well-known problem in
`
`14
`
`probability theory known as the birthday problem, and cryptographic attacks
`
`15
`
`on these functions typically take the form of birthday attacks. The birthday
`
`16
`
`problem is: Given a room full of people, how many people need to be at the
`
`17
`
`party before the likelihood of two people sharing a birthday is larger than ½?
`
`18
`
`Assuming that birthdays are evenly spread over the course of a year, the
`
`19
`
`answer is surprisingly small: it is only about 23. This applies to hash
`
`20
`
`collisions as well as birthdays, and we ask the question: How many messages
`
`22
`
`
`
`
`
`
`
`1
`
`must be hashed before the likelihood of a collision is greater than some value.
`
`2
`
`The next question is how high of a chance of collision is considered
`
`3
`
`acceptable? A typical acceptable value might be 1 in 1015 (1 in
`
`4
`
`1,000,000,000,000,000) for a small data storage system, while for a large data
`
`5
`
`center the value would be much lower (since there are many more files that
`
`6
`
`might collide). One might ask where 1 in 1015 comes from: it is related to the
`
`7
`
`undetected error rate in modern hard drives and reflects the capability of a
`
`8
`
`typical error correction code that they employ such as Reed-Solomon.
`
`9
`
`10
`
`11
`
`12
`
`13
`
`14
`
`15
`
`16
`
`17
`
`18
`
`19
`
`20
`
`The expected number of messages before a collision occurs can be
`
`approximated using the formula ݊ ൌ ටʹ ݀ ଵ
`ଵି where Ě is the range of the
`
`hash function, and p is the desired probability. For example, if we let Ě = 365
`
`and Ɖ = ½, then we get Ŷ = 22.4944. If we consider CRC-32, and assume that
`
`the hash values produced are uniformly random then ݀ ൌ ʹଷଶ and if Ɖ = ½,
`
`then we get Ŷ = 77,162. In fact, CRC-32 is not uniformly random and so the
`
`value of n will be significantly lower. In other words, we can expect to hash
`
`fewer than 77,162 random and different messages with a CRC-32 function, in
`
`a benign environment, to have a 50-50 chance that two different messages
`
`will produce the same hash value. This is a very small number of messages
`
`given that the average personal computer may have more than a million files.
`
`23
`
`
`
`
`
`
`
`1
`
`2
`
`3
`
`4
`