throbber

`
`
`
`
`UNIFIED PATENTS
`
`EXHIBIT 1015
`
`Unfied Patents Exhibit 1015
`Pg. 1
`
`

`

`TR-CS-96-05
`
`The rsync algorithm
`
`Andrew Tridgell and Paul Mackerras
`
`June 1996
`
`Joint Computer Science Technical Report Series
`Department of Computer Science
`Faculty of Engineering and Information Technology
`Computer Sciences Laboratory
`Research School of Information Sciences and Engineering
`
`Unfied Patents Exhibit 1015
`Pg. 2
`
`

`

`This technical report series is published jointly by the Department of
`Computer Science, Faculty of Engineering and Information Technology,
`and the Computer Sciences Laboratory, Research School of Information
`Sciences and Engineering, The Australian National University.
`Please direct correspondence regarding this series to:
`
`Technical Reports
`Department of Computer Science
`Faculty of Engineering and Information Technology
`The Australian National University
`Canberra ACT 0200
`Australia
`
`or send email to:
`
`Technical.Reports@cs.anu.edu.au
`
`A list of technical reports, including some abstracts and copies of some full
`reports may be found at:
`
`http://cs.anu.edu.au/techreports/
`
`Recent reports in this series:
`
`TR-CS-96-04 Peter Bailey and David Hawking. A parallel architecture for
`query processing over a terabyte of text. June 1996.
`TR-CS-96-03 Brendan D. McKay. autoson – a distributed batch system for
`unix workstation networks (version 1.3). March 1996.
`
`TR-CS-96-02 Richard P. Brent. Factorization of the tenth and eleventh Fermat
`numbers. February 1996.
`
`TR-CS-96-01 Weifa Liang and Richard P. Brent. Constructing the spanners
`of graphs in parallel. January 1996.
`
`TR-CS-95-08 David Hawking. The design and implementation of a parallel
`document retrieval engine. December 1995.
`
`TR-CS-95-07 Raymond H. Chan and Michael K. Ng. Conjugate gradient
`methods for Toeplitz systems. September 1995.
`
`Unfied Patents Exhibit 1015
`Pg. 3
`
`

`

`The rsync algorithm
`
`Paul Mackerras
`Andrew Tridgell
`Department of Computer Science
`Australian National University
`Canberra(cid:2) ACT (cid:2) Australia
`
`June (cid:2) 
`
`Abstract
`
`This report presents an algorithm for updating a le on one machine
`to be identical to a le on another machine(cid:3) We assume that the two
`machines are connected by a low(cid:4)bandwidth high(cid:4)latency bi(cid:4)directional
`communications link(cid:3) The algorithm identies parts of the source le
`which are identical to some part of the destination le(cid:5) and only sends
`those parts which cannot be matched in this way(cid:3) Eectively(cid:5) the algo(cid:4)
`rithm computes a set of dierences without having both les on the same
`machine(cid:3) The algorithm works best when the les are similar(cid:5) but will
`also function correctly and reasonably eciently when the les are quite
`dierent(cid:3)
`
` The problem
`
`Imagine you have two les(cid:3) A and B(cid:3) and you wish to update B to be the same
`as A(cid:4) The obvious method is to copy A onto B(cid:4)
`Now imagine that the two les are on machines connected by a slow com(cid:5)
`munications link(cid:3) for example a dial up IP link(cid:4) If A is large(cid:3) copying A onto
`B will be slow(cid:4) To make it faster you could compress A before sending it(cid:3) but
`that will usually only gain a factor of  to (cid:4)
`Now assume that A and B are quite similar(cid:3) perhaps both derived from the
`same original le(cid:4) To really speed things up you would need to take advantage
`of this similarity(cid:4) A common method is to send just the dierences between A
`and B down the link and then use this list of dierences to reconstruct the le(cid:4)
`The problem is that the normal methods for creating a set of dierences
`between two les rely on being able to read both les(cid:4) Thus they require that
`both les are available beforehand at one end of the link(cid:4) If they are not both
`available on the same machine(cid:3) these algorithms cannot be used once you had
`copied the le over(cid:3) you wouldn(cid:10)t need the dierences(cid:4) This is the problem
`that rsync addresses(cid:4)
`The rsync algorithm eciently computes which parts of a source le match
`some part of an existing destination le(cid:4) These parts need not be sent across
`the link(cid:13) all that is needed is a reference to the part of the destination le(cid:4)
`Only parts of the source le which are not matched in this way need to be sent
`verbatim(cid:4) The receiver can then construct a copy of the source le using the
`references to parts of the existing destination le and the verbatim material(cid:4)
`
`
`
`Unfied Patents Exhibit 1015
`Pg. 4
`
`

`

`Trivially(cid:2) the data sent to the receiver can be compressed using any of a
`range of common compression algorithms(cid:2) for further speed improvements(cid:3)
`
` The rsync algorithm
`
`Suppose we have two general purpose computers and (cid:3) Computer has
`access to a le A and has access to le B(cid:2) where A and B are similar(cid:6)(cid:3)
`There is a slow communications link between and (cid:3)
`The rsync algorithm consists of the following steps(cid:7)
`
` (cid:3) splits the le B into a series of non(cid:9)overlapping xed(cid:9)sized blocks of size
`S bytes (cid:3) The last block may be shorter than S bytes(cid:3)
`
`(cid:3) For each of these blocks calculates two checksums(cid:7) a weak rolling(cid:6)
` (cid:9)bit checksum described below and a strong (cid:9)bit MD checksum(cid:3)
`
` (cid:3) sends these checksums to (cid:3)
`
`(cid:3) searches through A to nd all blocks of length S bytes at any oset(cid:2)
`not just multiples of S that have the same weak and strong checksum
`as one of the blocks of B(cid:3) This can be done in a single pass very quickly
`using a special property of the rolling checksum described below(cid:3)
`
`(cid:3) sends a sequence of instructions for constructing a copy of A(cid:3) Each
`instruction is either a reference to a block of B(cid:2) or literal data(cid:3) Literal
`data is sent only for those sections of A which did not match any of the
`blocks of B(cid:3)
`
`The end result is that gets a copy of A(cid:2) but only the pieces of A that are
`not found in B plus a small amount of data for checksums and block indexes
`are sent over the link(cid:3) The algorithm also only requires one round trip(cid:2) which
`minimises the impact of the link latency(cid:3)
`The most important details of the algorithm are the rolling checksum and
`the associated multi(cid:9)alternate search mechanism which allows the all(cid:9)osets
`checksum search to proceed very quickly(cid:3) These will be discussed in greater
`detail below(cid:3)
`
` Rolling checksum
`
`The weak rolling checksum used in the rsync algorithm needs to have the prop(cid:9)
`erty that it is very cheap to calculate the checksum of a buer X(cid:3)(cid:3)Xn(cid:3) given
`the checksum of buer X (cid:3)(cid:3)Xn and the values of the bytes X and Xn(cid:3) (cid:3)
`The weak checksum algorithm we used in our implementation was inspired
`by Mark Adler(cid:18)s adler(cid:9)  checksum(cid:3) Our checksum is dened by
`
`Xi modM
`
`l X i
`
`(cid:4)k
`
`ak(cid:4) l (cid:19) 
`
` We have found that values of S between  and are quite good for most purposes
`
`
`
`Unfied Patents Exhibit 1015
`Pg. 5
`
`

`

`l (cid:0) i (cid:4) Xi modM
`
`l X i
`
`(cid:0)k
`
`bk(cid:2) l (cid:3) 
`
`sk(cid:2) l (cid:3)ak(cid:2) l (cid:4)  bk(cid:2) l
`
`where sk(cid:2) l is the rolling checksum of the bytes Xk (cid:3) (cid:3) (cid:3) Xl(cid:7) For simplicity
`and speed(cid:8) we use M (cid:3)  (cid:7)
`The important property of this checksum is that successive values can be
`computed very eciently using the recurrence relations
`
`ak (cid:4) (cid:2) l (cid:4)  (cid:3) ak(cid:2) l (cid:0) Xk (cid:4) Xl(cid:4)  mod M
`
`bk (cid:4) (cid:2) l (cid:4)  (cid:3) bk(cid:2) l (cid:0) l (cid:0) k (cid:4) Xk (cid:4) ak (cid:4) (cid:2) l (cid:4)  mod M
`
`Thus the checksum can be calculated for blocks of length S atall possible
`osets within a le in a rolling(cid:13) fashion(cid:8) with very little computation at each
`point(cid:7)
`Despite its simplicity(cid:8) this checksum was found to be quite adequate as a
`rst level check for a match of two le blocks(cid:7) We have found in practice that
`the probability of this checksum matching when the blocks are not equal is quite
`low(cid:7) This is important because the much more expensive strong checksum must
`be calculated for each block where the weak checksum matches(cid:7)
`
` Checksum searching
`
`Once has received the list of checksums of the blocks of B(cid:8) it must search A
`for any blocks at any oset that match the checksum of some block of B(cid:7) The
`basic strategy is to compute the (cid:15)bit rolling checksum for a block of length S
`starting at each byte of A in turn(cid:8) and for each checksum(cid:8) search the list for a
`match(cid:7) To do this our implementation uses a simple level searching scheme(cid:7)
`The rst level uses a (cid:15)bit hash of the (cid:15)bit rolling checksum and a  
`entry hash table(cid:7) The list of checksum values i(cid:7)e(cid:7)(cid:8) the checksums from the blocks
`of B is sorted according to the (cid:15)bit hash of the (cid:15)bit rolling checksum(cid:7) Each
`entry in the hash table points to the rst element of the list for that hash value(cid:8)
`or contains a null value if no element of the list has that hash value(cid:7)
`At each oset in the le the (cid:15)bit rolling checksum and its (cid:15)bit hash are
`calculated(cid:7) If the hash table entry for that hash value is not a null value(cid:8) the
`second level check is invoked(cid:7)
`The second level check involves scanning the sorted checksum list starting
`with the entry pointed to by the hash table entry(cid:8) looking for an entry whose
` (cid:15)bit rolling checksum matches the current value(cid:7) The scan terminates when
`it reaches an entry whose (cid:15)bit hash diers(cid:7) If this search nds a match(cid:8) the
`third level check is invoked(cid:7)
`The third level check involves calculating the strong checksum for the current
`oset in the le and comparing it with the strong checksum value in the current
`list entry(cid:7) If the two strong checksums match(cid:8) we assume that we have found
`a block of A which matches a block of B(cid:7) In fact the blocks could be dierent(cid:8)
`but the probability of this is microscopic(cid:8) and in practice this is a reasonable
`assumption(cid:7)
`When a match is found(cid:8) sends the data in A between the current le
`oset and the end of the previous match(cid:8) followed by the index of the block in
`
`
`
`Unfied Patents Exhibit 1015
`Pg. 6
`
`

`

`B that matched(cid:2) This data is sent immediately a match is found(cid:3) which allows
`us to overlap the communication with further computation(cid:2)
`If no match is found at a given oset in the le(cid:3) the rolling checksum is
`updated to the next oset and the search proceeds(cid:2) If a match is found(cid:3) the
`search is restarted at the end of the matched block(cid:2) This strategy saves a
`considerable amount of computation for the common case where the two les
`are nearly identical(cid:2)
`In addition(cid:3) it would be a simple matter to encode the
`block indexes as runs(cid:3) for the common case where a portion of A matches a
`series of blocks of B in order(cid:2)
`
` Pipelining
`
`The above sections describe the process for constructing a copy of one le on
`a remote system(cid:2) If we have a several les to copy(cid:3) we can gain a considerable
`latency advantage by pipelining the process(cid:2)
`This involves initiating two independent processes(cid:2) One of the processes
`generates and sends the checksums to while the other receives the dierence
`information from and reconstructs the les(cid:2)
`If the communications link is buered then these two processes can proceed
`independently and the link should be kept fully utilised in both directions for
`most of the time(cid:2)
`
` Results
`
`To test the algorithm(cid:3) tar les were created of the Linux kernel sources for two
`versions of the kernel(cid:2) The two kernel versions were (cid:2) (cid:2) and (cid:2)(cid:2)(cid:2) These
`tar les are approximately MB in size and are separated by  released patch
`levels(cid:2)
`Out of the  les in the (cid:2) (cid:2) release(cid:3)  les had changed in the (cid:2)(cid:2)
`release(cid:3) les had been removed and  les had been added(cid:2)
`A di(cid:13) of the two tar les using the standard GNU di utility produced
`over  thousand lines of output totalling (cid:2) MB(cid:2)
`The following table shows the results for rsync between the two les with a
`varying block size(cid:2)
`
`block matches
`size
`
`tag
`hits
`
`false
`alarms
`
`data
`
`written read
`
`
`
`
`
`
`
`
` 
` 
`
`
`
`   
`
` 
`
` 
`
` 
`
`
`
`
`
` 
`
` 
`  
` 
`
` 
`  
`  
`  
`  
`
`  
`  
` 
` 
`
`
`In each case(cid:3) the CPU time taken was less than the time it takes to run
`di(cid:13) on the two les(cid:2)
`
`All the tests in this section were carried out using rsync version (cid:3)
` The wall clock time was approximately  minutes per run on a  MHz SPARC running
`SunOS(cid:7) using rsh over loopback for communication(cid:3) GNU di took about  minutes(cid:3)
`
`
`
`Unfied Patents Exhibit 1015
`Pg. 7
`
`

`

`The columns in the table are as follows(cid:2)
`
`block size The size in bytes of the checksummed blocks(cid:3)
`
`matches The number of times a block of B was found in A(cid:3)
`
`tag hits The number of times the  bit hash of the rolling checksum matched
`a hash of one of the checksums from B(cid:3)
`
`false alarms The number of times the  bit rolling checksum matched but
`the strong checksum didn(cid:8)t(cid:3)
`
`data The amount of le data transferred verbatim(cid:10) in bytes(cid:3)
`
`written The total number of bytes written by including protocol overheads(cid:3)
`This is almost all le data(cid:3)
`
`read The total number of bytes read by including protocol overheads(cid:3) This
`is almost all checksum information(cid:3)
`
`The results demonstrate that for block sizes above bytes(cid:10) only a small
`fraction around  of the le was transferred(cid:3) The amount transferred was
`also considerably less than the size of the di le that would have been trans(cid:17)
`ferred if the dipatch method of updating a remote le was used(cid:3)
`The checksums themselves took up a considerable amount of space(cid:10) although
`much less than the size of the data transferred in each case(cid:3) Each pair of
`checksums consumes  bytes(cid:2)  bytes for the rolling checksum plus  bytes
`for the (cid:17)bit MD checksum(cid:3)
`The number of false alarms was less than (cid:3) of the number of true
`matches(cid:10) indicating that the  bit rolling checksum is quite good at screening
`out false matches(cid:3)
`The number of tag hits indicates that the second level of the checksum
`search algorithm was invoked about once every  characters(cid:3) This is quite high
`because the total number of blocks in the le is a large fraction of the size of the
`tag hash table(cid:3) For smaller les we would expect the tag hit rate to be much
`closer to the number of matches(cid:3) For extremely large les(cid:10) we should probably
`increase the size of the hash table(cid:3)
`The next table shows similar results for a much smaller set of les(cid:3) In this
`case the les were not packed into a tar le rst(cid:3) Rather(cid:10) rsync was invoked
`with an option to recursively descend the directory tree(cid:3) The les used were
`from two source releases of another software package called Samba(cid:3) The total
`source code size is (cid:3) MB and the di between the two releases is   lines
`long totalling  kB(cid:3)
`
`block matches
`size
`
`
`
`
`
`
` 
` 
`  
` 
` 
`
`tag
`hits
` 
` 
` 
` 
` 
`
`false
`alarms
`
`
`
`
`
`
`data
`
`written read
`
`  
`  
` 
`
`
`
` 
` 
` 
` 
`
`
` 
` 
` 
` 
` 
`
`
`
`Unfied Patents Exhibit 1015
`Pg. 8
`
`

`

` Availability
`
`An implementation of rsync which provides a convenient interface similar to the
`common UNIX command rcp has been written and is available for download
`from ftp(cid:2)samba(cid:4)anu(cid:4)edu(cid:4)aupubrsync(cid:4)
`
`
`
`Unfied Patents Exhibit 1015
`Pg. 9
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket