IPR2016-01366, No. 1007 Exhibit - Burrows (P.T.A.B. Jul. 8, 2016)

85
`
`On-line Data Compression
`in a Log-structured File System
`
`Michael Burrows, Charles Jerian, Butler Lampson,
`Timothy Mann
`
`April 15, 1992
`
`APPLE 1007
`
`1
`
`

`
`Systems Research Center
`
`DEC’s business and technology objectives require a strong research program.
`The Systems Research Center (SRC) and three other research laboratories are
`committed to ﬁlling that need.
`
`SRC began recruiting its ﬁrst research scientists in l984—their charter, to advance
`the state of knowledge in all aspects of computer systems research. Our current
`work includes exploring high-performance personal computing, distributed com-
`puting, programming environments, system modelling techniques, speciﬁcation
`technology, and tightly-coupled multiprocessors.
`
`Our approach to both hardware and software research is to create and use real
`systems so that we can investigate their properties fully. Complex systems cannot
`be evaluated solely in the abstract. Based on this belief, our strategy is to demon-
`strate the technical and practical feasibility of our ideas by building prototypes and
`using them as daily tools. The experience we gain is useful in the short term in
`enabling us to reﬁne our designs, and invaluable in the long term in helping us to
`advance the state of knowledge about those systems. Most of the major advances
`in information systems have come through this strategy, including time-sharing,
`the ArpaNet, and distributed personal computing.
`
`SRC also performs work of a more mathematical ﬂavor which complements our
`systems research. Some of this work is in established ﬁelds of theoretical computer
`science, such as the analysis of algorithms, computational geometry, and logics of
`programming. The rest of this work explores new ground motivated by problems
`that arise in our systems research.
`
`DEC has a strong commitment to communicating the results and experience gained
`through pursuing these activities. The Company values the improved understand-
`ing that comes with exposing and testing our ideas within the research community.
`SRC will therefore report results in conferences, in professional journals, and in
`our research report series. We will seek users for our prototype systems among
`those with whom we have common research interests, and we will encourage
`collaboration with university researchers.
`
`Robert W. Taylor, Director
`
`2
`
`

`
`On-line Data Compression
`in a Log-structured File System
`
`Michael Burrows
`Charles Jerian
`Butler Lampson
`Timothy Mann
`
`April 15, 1992
`
`iii
`
`3
`
`

`
`This report appeared in the proceedings of the Fifth International Conference
`on Architechural Support for Programming Languages and Operating Systems
`(ASPLOS-V), 12-15 October, 1992, published by ACM Press.
`
`cDigital Equipment Corporation 1992
`
`This work may not be copied or reproduced in whole or in part for any commercial
`purpose. Permission to copy in part without payment of fee is granted for nonproﬁt
`educational and research purposes provided that all such whole or partial copies
`include the following: a notice that such copying is by permission of the Systems
`Research Center of Digital Equipment Corporation in Palo Alto, California; an
`acknowledgment of the authors and individual contributors to the work; and all
`applicable portions of the copyright notice. Copying, reproducing, or republishing
`for any other purpose shall require a license with payment of fee to the Systems
`Research Center. All rights reserved.
`
`iv
`
`4
`
`

`
`Abstract
`
`We have incorporated on-line data compression into the low levels of
`a log-structured ﬁle system (Rosenblum’s Sprite LFS). Each block of data
`or meta-data is compressed as it is written to the disk and decompressed
`as it is read. The log-structuring overcomes the problems of allocation and
`fragmentation for variable-sized blocks. We observe compression factors
`ranging from 1.6 to 2.2, using algorithms running from 1.7 to 0.4 MBytes
`per second in software on a DECstation 5000/200. System performance is
`degraded by a few percent for normal activities (such as compiling or editing),
`and as much as a factor of 1.6 for ﬁle system intensive operations (such as
`copying multi-megabyte ﬁles).
`Hardware compression devices mesh well with this design. Chips are
`already available that operate at speeds exceeding disk transfer rates, which
`indicates that hardware compression would not only remove the performance
`degradation we observed, but might well increase the effective disk transfer
`rate beyond that obtainable from a system without compression.
`
`v
`
`5
`
`

`
`Contents
`
`1
`
`Introduction
`
`2 Sprite LFS
`
`3 Adding compression to LFS
`3.1 Logical and physical disk space : : : : : : : : : : : : : : : : : :
`3.2 Reading a logical block : : : : : : : : : : : : : : : : : : : : : :
`3.3 Writing to the log : : : : : : : : : : : : : : : : : : : : : : : : :
`3.4 Crash recovery : : : : : : : : : : : : : : : : : : : : : : : : : : :
`3.5 Compression block size : : : : : : : : : : : : : : : : : : : : : :
`3.6 Free space : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
`3.7 File-speciﬁc compression : : : : : : : : : : : : : : : : : : : : :
`
`4 Compression algorithms
`
`5 LFS and compression hardware
`5.1 System changes
`: : : : : : : : : : : : : : : : : : : : : : : : : :
`5.2 Compression hardware : : : : : : : : : : : : : : : : : : : : : : :
`
`6 Performance of prototype
`
`7 Comparison with other work
`
`8 Summary
`
`Acknowledgements
`
`References
`
`1
`
`3
`
`4
`4
`4
`6
`8
`9
`9
`10
`
`10
`
`12
`12
`13
`
`14
`
`16
`
`18
`
`18
`
`20
`
`vi
`
`6
`
`

`
`1 Introduction
`
`Building a ﬁle system that compresses the data it stores on disk is clearly an
`attractive idea. First, more data would ﬁt on the disk. Also, if a fast hardware
`data compressor could be put into the data path to disk, it would increase the
`effective disk transfer rate by the compression factor, thus speeding up the system.
`Yet on-line data compression is seldom used in conventional ﬁle systems, for two
`reasons.
`First, compression algorithms do not provide uniform compression on all data.
`When a ﬁle block is overwritten, the new data may be compressed by a different
`amount from the data it supersedes. Therefore the ﬁle system cannot simply
`overwrite the original blocks—if the new data is larger than the old, it must be
`written to a place where there is more room; if it is smaller, the ﬁle system must
`either ﬁnd some use for the freed space or see it go to waste. In either case, disk
`space tends to become fragmented, which reduces the effective compression.
`
`compression ratio
`input block size
`(output size/input size)
`(bytes)
`68%
`1K
`63%
`2K
`59%
`4K
`55%
`8K
`53%
`16K
`51%
`32K
`The ﬁle progc from the Calgary Compression Corpus [3] was compressed using various
`block sizes. The ﬁle contains 39611 bytes of C source. The entire ﬁle was compressed,
`one block at a time. The compression algorithm is described below in Section 4 as
`Algorithm 2.
`
`Table 1: An example of improved compression with increased block size.
`
`Second, the best compression algorithms are adaptive—they use patterns dis-
`covered in one part of a block to do a better job of compressing information in
`other parts [3]. These algorithms work better on large blocks of data than on small
`blocks. Table 1 shows the variation in compression ratio with block size for a
`simple adaptive compression algorithm. The details vary for different compres-
`sion algorithms and different data, but the overall trend is the same—larger blocks
`make for better compression.
`
`1
`
`7
`
`

`
`
`
`AAAAAA
`
`
`AAAAAA
`
`
`AAAAAA
`AA
`AA
`AAAAAA
`
`AA
`AA
`AA
`Figure 1: Simpliﬁed view of LFS’s log.
`
`file block
`
`inode block
`
`unused space
`
`inode map block
`
`However, it is difﬁcult to arrange for sufﬁciently large blocks of data to be
`compressed all at once. Most ﬁle systems use block sizes that are too small for
`good compression, and increasing the block size would waste disk space in frag-
`mentation. Compressing multiple blocks at a time seems difﬁcult to do efﬁciently,
`since adjacent blocks are often not written at the same time. Compressing whole
`ﬁles would also be less than ideal, since in many systems most ﬁles are only a few
`kilobytes [2].
`In a log-structured ﬁle system, the main data structure on disk is a sequentially
`written log. All new data, including modiﬁcations to existing ﬁles, is written to
`the end of the log. This technique has been demonstrated by Rosenblum and
`Ousterhout in a system called Sprite LFS [9]. The main goal of LFS was to provide
`improved performance by eliminating disk seeks on writes. In addition, LFS is
`ideally suited for adding compression—we simply compress the log as it is written
`to disk. No blocks are overwritten, so we do not have to make special arrangements
`when new data does not compress as well as existing data. Because blocks are
`written sequentially, the compressed blocks can be packed tightly together on disk,
`eliminating fragmentation. Better still, we can choose to compress blocks of any
`size we like, and if many related small ﬁles are created at the same time, they
`will be compressed together, so any similarities between the ﬁles will lead to better
`compression. We do, however, need additional bookkeeping to keep track of where
`compressed blocks fall on the disk.
`The remainder of this paper describes the relevant features of Sprite LFS,
`the changes needed to introduce compression, and the performance of our modi-
`ﬁed system using simple software compression routines. We argue that suitable
`hardware compression devices can be readily obtained or constructed.
`
`2
`
`8
`
`

`
`2 Sprite LFS
`
`Sprite LFS places all ﬁle data and almost all meta-data (for example, directory
`information) in a sequentially written log. The ﬁle system data structures allow
`any block of an existing ﬁle to be located quickly in the log. For a full description
`see Rosenblum and Ousterhout’s paper [9].
`The log is stored in a chain of ﬁxed-size segments, each of which is a contiguous
`array of disk sectors. The system keeps track of the current segment, which is the
`segment containing the current end of the log. When the current segment becomes
`full, the system picks an unused segment, links it onto the end of the chain, and
`makes it the current segment.
`As ﬁles are deleted and rewritten, some of the data in the log becomes obsolete.
`The space occupied by this data is reclaimed by the segment cleaner, which is
`analogous to a copying garbage collector in a programming language run-time
`system. The cleaner chooses segments and scans them to determine which blocks
`are still in use, appending them to the log in the usual manner. The cleaned
`segments are then unlinked from the log and made available for reuse.
`Some of the blocks written to the log contain ﬁle data, while others contain
`meta-data of various kinds, and the segment cleaner must be able to tell which is
`which. To do this, it uses summary blocks, which also appear in the log. These
`blocks contain a small amount of identifying information for each item written to
`the log. At least one summary block is written for each segment, and additional
`blocks are used if the ﬁrst is not large enough to describe all the data being written.
`The summary blocks in a segment are linked together to form a list.
`Inodes are an important type of meta-data. Like a UNIX1 inode, a Sprite
`LFS inode contains information about its associated ﬁle, including pointers to data
`blocks contained in the ﬁle. Whenever a ﬁle is modiﬁed, an updated copy of its
`inode is placed in the log. To avoid fragmentation, LFS places several inodes
`together in one block when it can. LFS maintains an inode map, which allows the
`current copy of any inode to be found quickly. The inode map is also written to
`the log.
`When LFS writes data to the log, it ﬁrst builds a list of elements. Each element
`represents a number of disk blocks, and the order of the elements gives the order
`in which the blocks will appear on disk. Each element is a logical unit, such as a
`summary block, a block of inodes, or ﬁle data. It is only when an element is added
`to the list that the system can tell where the data in the element will fall on the disk.
`Periodically, LFS writes a checkpoint, by ﬂushing the current state of in-
`
`1UNIX is a registered trademark of UNIX System Laboratories, Inc.
`
`3
`
`9
`
`

`
`memory data structures to the log and writing a pointer to the end of the log in a
`ﬁxed location on disk. When recovering from crashes, LFS restores the ﬁle system
`state as of the last checkpoint, and then rolls forward by following the chain of log
`segments written subsequently.
`
`3 Adding compression to LFS
`
`3.1 Logical and physical disk space
`
`As with the original Sprite LFS, our modiﬁed system divides the physical disk
`into ﬁxed-sized segments chained together to form a log. However, because of
`compression, each segment can contain more user data than its physical size might
`suggest. Accordingly, we allocate space within the segment in two ways: logical
`and physical. The physical space represents real disk blocks, while the logical
`space represents the space that would have been occupied by the data if it were not
`compressed.
`As each kilobyte of data is added to a segment, it occupies a kilobyte of logical
`space, but may occupy less physical space. A segment is declared full when either
`its physical or its logical space is exhausted. (If logical space becomes full ﬁrst,
`some physical space is wasted. So logical space should be larger than physical
`space by a factor greater than the maximum compression expected.)
`Logical space is subdivided into compression blocks, which are the units of
`compression and decompression. A logical disk address consists of a segment
`number, a compression block number within the segment, and a sector number
`within the compression block. We chose a physical segment size of 512 KBytes
`and a maximum compression factor of four, so our logical segment size was 2
`MBytes. Our compression blocks are 16 KBytes, and our sector size is 512 bytes.
`Thus, our logical disk addresses ﬁt easily in a 32-bit word:
`
`segment (20 bits)
`
`compression block (7 bits)
`
`sector (5 bits)
`
`3.2 Reading a logical block
`
`Most of the modiﬁed ﬁle system deals exclusively in logical disk addresses. For
`example, the disk addresses used by the ﬁle system cache are logical, and so are the
`disk addresses placed in inodes. Physical addresses are needed only by modules at
`the lowest level of the ﬁle system.
`
`4
`
`10
`
`

`
`16 KByte compression block
`
`2 MByte logical segment
`
`unused logical space
`
`logical block map
`
`unused physical space
`
`512 KByte physical segment
`
`Figure 2: Logical and physical views of a segment.
`
`The module that reads a disk block given its logical address needs a way to
`ﬁnd the physical address of the compressed bytes. We keep a logical block map
`for each segment, which is simply an array indexed by compression block number,
`whose entries are the physical byte addresses of the blocks relative to the start of
`the segment. The block map is constructed in memory as the segment is being
`compressed, and written to the end of the segment when the segment is full. The
`maps are needed for all ﬁle reads, so they are cached in memory whenever possible.
`Because our logical segments contain 128 compression blocks and our physical
`segments are 512 KBytes, our block maps each contain 128 four-byte entries,
`which exactly ﬁlls one 512-byte disk sector. (The entries could be reduced to two
`bytes each by restricting compression blocks to begin on eight byte boundaries
`within the segment, but we did not implement this optimization in our prototype.)
`The procedure for ﬁnding the compressed data associated with a logical address is
`as follows:
`
`1. Extract the segment number from the logical address. Use it to ﬁnd the
`logical block map for the segment.
`
`2. Extract the compression block number from the address. Use it to index the
`logical block map. This yields the physical byte offset of the compressed
`data within the segment.
`
`5
`
`11
`
`

`
`3. Examine the next entry in the map, to ﬁnd the start of the next block. This
`determines how much data should be read from the disk.
`
`4. Read the compressed data from the disk and decompress it.
`
`5. Extract the sector number from the logical address. Use it to identify the
`desired sector within the decompressed block.
`
`Unfortunately, this procedure reads and decompresses a full compression block
`even if the caller wanted only some smaller unit, such as a ﬁle system block or
`a physical sector. We alleviate this problem by caching the entire decompressed
`block in memory, rather than just caching the requested sectors. The data could
`be placed in the ﬁle system buffer cache, but for simplicity in our prototype, we
`cached the last decompressed block within the read routine. Sprite LFS reads ﬁles
`sequentially in 4 KByte units, so this simple caching strategy typically achieves
`three hits for each 16 KByte compression block when reading large ﬁles.
`When the ﬁle system is reading non-sequentially, the additional time to read a
`full compression block cannot be hidden by caching. Fortunately, this time is small
`compared to the rotational latency. The time needed to decompress the full block
`in software is several milliseconds; it would be much smaller if decompression
`were implemented in hardware.
`
`3.3 Writing to the log
`
`Conceptually, it is simple to write compressed data to the log. As LFS constructs its
`element list, it can compress the data into a buffer, one compression block at a time.
`When there is no more data to add to the list, or when logical or physical segment
`space is exhausted, it writes the buffer to disk and updates the logical block map
`for the segment. In practice, however, the design was more complicated.
`In Sprite LFS, the contents of certain kinds of element, such as inode blocks,
`are allowed to change even after they are added to the element list; we will call
`these variable elements. For example, when LFS puts the ﬁrst inode in a segment,
`it allocates an empty inode block and places it on the element list. When more
`inodes are added to the segment, they are placed in this inode block. Eventually,
`it may become full, at which point another block is allocated for inodes. Thus,
`an inode block may have inodes added to it long after it has been placed on the
`element list. A similar strategy is used for summary blocks—this technique allows
`LFS to make better use of disk space when allocating small data structures.
`In an unmodiﬁed LFS, these variable elements present no difﬁculty; the con-
`tents of elements can be changed freely until the segment is written to disk, as long
`
`6
`
`12
`
`

`
`as their size does not change. However, in our system, the data for each element
`must be known before the element can be compressed, and compression must take
`place early enough to decide which elements ﬁt into the current segment. We know
`of two ways to accommodate variable elements in a compressed LFS.
`One technique is to delay compressing variable elements until just before the
`segment is written to disk. Enough physical space must be reserved to hold the
`variable elements, assuming the worst-case compression ratio (i.e. that they cannot
`be compressed at all). With this approach, there is little beneﬁt in compressing the
`variable elements, since the space saved cannot easily be used.
`A second technique is to delay compressing all elements as long as possible,
`allowing variable elements to be changed freely until they are compressed, but
`not afterwards. We do not compress elements as they are added; instead, we
`reserve enough physical space for each element to accommodate the worst-case
`compression ratio, until we can no longer be sure the next element will ﬁt into
`the physical segment. Then we compress all the elements, correct the amount of
`physical space remaining to reﬂect the actual compression achieved, and mark the
`elements as no longer variable. When an inode block is compressed, for example,
`it can be marked as full so that no further inodes will be added to it. A new block
`will automatically be allocated if needed.
`The following pseudo-code illustrates the action taken for each element elem,
`containing size(elem) bytes. The worst case function returns the largest
`number of bytes that may be needed to represent the data when compressed.
`
`if (size(elem) > logical_space_left) {
`goto segment_full
`
`} i
`
`f (worst_case(elem) > phys_space_left) {
`compress all elements not yet compressed
`increase phys_space_left by bytes saved
`mark inode blocks full
`if (worst_case(elem) > phys_space_left) {
`goto segment_full
`
`}
`
`}a
`
`dd elem to element list, without compression
`reduce logical_space_left by size(elem)
`
`This approach is less general than the ﬁrst, since it works only if the code
`that modiﬁes variable elements can be told to stop doing so at an arbitrary point;
`however, it does allow variable elements to be compressed.
`Both the summary blocks and inode blocks of Sprite LFS could be handled
`using either technique. As a matter of expediency, we used the ﬁrst technique for
`
`7
`
`13
`
`

`
`summary blocks and the second for inode blocks, and we chose not to compress
`summary blocks. These choices worked out fairly well: there are few summary
`blocks per segment, so little compression is lost, and crash recovery is not adversely
`affected (see Section 3.4). Inode blocks tend to be less full than in unmodiﬁed
`LFS—two or three blocks might be allocated per segment where only one would
`have been used before—but little physical space is wasted because a partially-
`empty block compresses exceedingly well. However, programs that read many
`inodes (such as the ﬁle tree walker ﬁnd) do not perform as well; see Section 6
`below for details.
`A third technique for dealing with variable elements is to eliminate them. We
`could have modiﬁed Sprite LFS to ﬁll summary blocks and inode blocks completely
`before adding them to the element list. This would have been difﬁcult, however,
`because the logical disk address of a data block is not known until it is added to
`the element list, and the current LFS implementation needs to know the addresses
`of summary blocks and inode blocks as it ﬁlls them in, for use as pointers in other
`data structures. Also, LFS needs to know that the summary block it is ﬁlling will
`ﬁt into the current segment, not be spilled over into the next one. Therefore we
`chose not to take this approach in our prototype.
`
`3.4 Crash recovery
`
`Our changes introduce a new problem in crash recovery. The system could crash
`after a segment has been written, but before the logical block map has been
`committed to disk. One might view this as an argument for writing the logical
`block map ﬁrst, at the start of the segment. However, LFS often ﬁlls a segment
`using a small number of partial writes—these are used when it is necessary to
`commit data to disk, but the data does not ﬁll an entire segment. So, on successive
`partial writes the segment’s block map would be updated in place, and could
`therefore be lost in a crash. Alternatively, we could allocate a new block map for
`each partial segment written, but this would consume additional disk space and
`complicate the process of reading and parsing the maps.
`Fortunately, this problem is easy to solve. We maintain the invariant that all
`segments on the disk have a valid logical block map on disk, except possibly the
`current segment. When the ﬁle system examines the disk after a crash, it must ﬁrst
`read the current segment, decompress it from the beginning, and reconstruct the
`logical block map. This action is similar to rolling forward from a checkpoint.
`We do not compress the checkpoint areas or the summary blocks. Together,
`these blocks contain enough information for LFS to ﬁnd the end of the log during
`recovery: we modiﬁed the segment cleaner to read segments without using the
`
`8
`
`14
`
`

`
`block map in order to exercise this algorithm.
`
`3.5 Compression block size
`
`We would like to compress data in large blocks, but there are reasons for choosing
`smaller blocks.
`First, when reading bytes from the disk, decompression must start from the
`beginning of a compressed block. Thus, the mean latency for reading a randomly
`chosen byte on the disk is increased by the time to read the block from the disk
`and decompress it. A 16 KByte block seems to be a reasonable choice, since it is
`large enough to allow the compression algorithm to obtain most of the compression
`available, but small enough that it adds only a few milliseconds to the transfer time.
`In software on a DECstation 5000/2002, the decompression time is 9 ms for the
`fastest algorithm we tried; in hardware, decompression would be faster than the
`disk transfer, and could be overlapped with it more easily.
`A second issue is that applications often commit small amounts of data to
`disk, resulting in poor compression. The obvious way to overcome this problem is
`to employ a small amount of non-volatile memory to postpone compression until
`more data is available. Alternatively, one can ignore the problem and write the data
`immediately, because the segment cleaner will later combine the data from adjacent
`writes to form a full-sized compression block. This is similar to the strategy used
`in unmodiﬁed LFS to recover unused space left in inode and summary blocks after
`small amounts of data have been committed.
`The cleaner saves space in other minor ways too. In our system, when two
`compression blocks are written together, no gap need be left between them. But
`when two blocks are forced to disk independently, the second must start on a sector
`boundary to avoid the danger of damaging previously written data. This causes
`fragmentation that is removed later by the segment cleaner.
`
`3.6 Free space
`
`One problem with using compression in a ﬁle system is that it leaves the naive user
`open to surprises. For one thing, it is no longer obvious how to report the amount
`of space remaining on the disk! It seems best to report the amount of data that
`could be put on the disk assuming worst-case compression (i.e. no compression at
`all), so that the user is more likely to be agreeably surprised than upset.
`A more worrying problem is that actions that do not consume disk space in
`conventional ﬁle systems may do so when compression is used. For example, if a
`
`2DECstation is a trademark of Digital Equipment Corporation
`
`9
`
`15
`
`

`
`block in a ﬁle is overwritten, the new data may not compress as much as the old
`data. When compression is used, such operations cannot be allowed when the disk
`is full.
`
`3.7 File-speciﬁc compression
`
`Different compression algorithms are better suited to different data, and so one
`might like to choose the algorithm to be used for each ﬁle. One way to do this in
`our system would be to place some additional hints in each inode. The LFS module
`that places ﬁle data blocks into the element list could use the hints to mark elements
`for compression by different algorithms. One possible use of such a scheme would
`be to indicate that certain ﬁles should not be compressed at all; this would free
`those ﬁles from the restriction noted in the previous subsection.
`While ﬁle-speciﬁc compression could easily be applied to our current software
`implementation, it would certainly complicate the hardware design described in
`Section 5. Moreover, only large ﬁles would beneﬁt from this approach, since only
`one compression algorithm would be used for each 16 KBytes of data.
`
`4 Compression algorithms
`
`A wide variety of compression algorithms could be used for this application,
`but fast, simple algorithms are clearly preferable. One suitable algorithm was
`introduced by David Wheeler in his red and exp programs [13]; similar algorithms
`were also described by Raita and Teuhola [8]. Williams has developed algorithms
`that are particularly fast for decompression [15, 16].
`The simplest form of Wheeler’s algorithm (which we will call Algorithm 1) is
`as follows. A hash table t, whose entries are bytes, is initialized to a known state,
`such as all zeroes. For each input byte c, the compressor constructs a hash value
`h based on the preceding three bytes. The value found in table entry th is called
`the predicted character. If th = c, the compressor outputs a token indicating
`a correct prediction. If th = c, the compressor sets the entry th to be c, and
`outputs a token indicating that prediction failed and that the input character was c.
`If this algorithm expands the block, the compressor outputs a token indicating that
`no compression has occurred, followed by a copy of the original data.
`The decompressor maintains a similar hash table t. For each token read from
`the compressor, the decompressor constructs a hash value h based on the last three
`bytes written to its output. The hash function and initial setting of t must be identical
`in compressor and decompressor. If the token indicates a successful prediction,
`
`10
`
`16
`
`

`
`the decompressor reads and outputs the byte th. If it indicates prediction failure
`with the input character c, the decompressor sets the entry th to c and outputs c.
`In our tests, using Sprite system binaries and an operating system source tree,
`this algorithm predicts around 50% of the bytes in a typical 16 KByte block and
`compresses it down to 10 KBytes. We used a modest hash table size (n = 4096
`entries) and a simple hash function (h = 256c3 16c2 c1 mod n, where ci is
`the ith character preceding the current character c, and is bitwise exclusive-or),
`and we chose to encode a correct prediction with one bit and a missed prediction
`with nine bits.
`An improved version of the algorithm (which we will call Algorithm 2) adds a
`second hash table holding an alternative prediction, and a token to indicate a correct
`prediction in the second table. In addition, it applies Huffman coding to the stream
`of predictions and misses, instead of using a ﬁxed-length code for misses. Our
`implementation uses a static Huffman code computed from frequencies of bytes
`seen on a typical UNIX disk.
`Besides the algorithms based on Wheeler’s ideas, we tried the LZRW1-A and
`LZRW3-A algorithms due to Williams. A full description and implementation are
`available elsewhere [15, 16], so we omit the details here.
`
`compression
`Compressor Compression speed Decompression Speed
`ratio
`KBytes/s
`KBytes/s
`61%
`1700
`2200
`Algorithm 1
`51%
`590
`670
`Algorithm 2
`52%
`910
`2400
`LZRW1-A
`48%
`360
`1500
`LZRW3-A
`50%
`250
`410
`compress
`36%
`45
`390
`zoo -h
`This table shows the performance of six compression algorithms on 240 MBytes of
`data from a log-structured ﬁle system containing Sprite operating system source and
`binaries. The ﬁgures give the compression speed, decompression speed, and the ratio
`of output size to input size (compression ratio). Compression was performed on 16
`KByte blocks. Compression speed is in kilobytes of uncompressed data processed
`per second of CPU time, given to two signiﬁcant ﬁgures. Times were measured on a
`DECstation 5000/200.
`
`Table 2: Comparison of compression algorithms.
`
`Table 2 illustrates the performance of these algorithms on data typical for a
`UNIX program development environment. For comparison, we include ﬁgures for
`
`11
`
`17
`
`

`
`the popular compress utility, which uses the LZC algorithm, and the zoo archiver,
`which uses the LZSS algorithm. The table shows that the algorithms we chose are
`quite fast, but better compression could be obtained by sacriﬁcing speed.
`Bell, Witten, and Cleary give a more thorough comparison of compression
`algorithms and their effectiveness on different sorts of data [3]. They also describe
`the LZC and LZSS algorithms.
`
`5 LFS and compression hardware
`
`We have arranged our system so that the compression algorithm can be changed
`easily. An obvious improvement would be to replace the compression routine
`with a piece of hardware. As noted in the introduction, this would reduce the
`performance penalty exacted by software compression, and would increase the
`effective disk transfer rate. We have not yet integrated compression hardware into
`our prototype, but in this section we discuss how it might best be done.
`
`5.1 System changes
`
`The simplest way to add hardware compression to our design would be to build a
`DMA device that reads uncompressed data from one buffer and writes compressed
`data to another (and vice versa), and use it as a direct replacement for our software
`compression module.
`Consider a disk with a transfer rate of d MBytes per second, and a compressor
`that achieves compression ratio r at a rate of c MBytes per second. Without
`compression, the time to transfer n MBytes is n=d seconds. With a separate
`(Seek and rotational
`hardware compressor, this becomes n=c + nr=d seconds.
`delays are unaffected. We assume that DMA setup times are negligible for blocks
`of more than a few kilobytes.) We would like to reduce the total time, which
`implies that c d=1 r. For example, when r = 0:5, d = 2 MBytes per second,
`c must exceed 4

This document is available on Docket Alarm but you must sign up to view it.

Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

Up-to-date information for this case.
Email alerts whenever there is an update.
Full text search for other cases.
Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.

Access Government Site

We are redirecting you
to a mobile optimized page.

Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket

Supplemental Search

Search for PTAB Motions

PTAB Analytics

TTAB Analytics

Basic Search

Filters

Party Search

Advanced

Selected Courts

Recently Selected Courts

Find PTAB Decisions

PTAB Analytics

Special PTAB Alerts

Orange Book

Directly Search Federal Courts

Search Trademark ...

This document is available on Docket Alarm but you must sign up to view it.

Accessing this document will incur an additional charge of $.

Still Working On It

A few More Minutes ... Still Working

This document could not be displayed.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

One Moment Please

Your document is on its way!

Sealed Document

We are redirecting youto a mobile optimized page.

Document Unreadable or Corrupt

We are unable to display this document.

STEP 2 of 2

Choose your membership type

Flat-Fee

Pay-As-You-Go

Add your payment information

Login or Join

Enter your corporate Email

Thousands of your peers are saving time and gaining a competitive advantage with Docket Alarm.

Join Docket Alarm to perform smarter legal research.

Download this document and millions of others instantly with a Docket Alarm membership.

Join Docket Alarm and start performing smarter legal research.

Start tracking this docket instantly with a Docket Alarm membership.

Join thousands of your peers and start performing smarter legal research.

STEP 1 of 2

Millions of Documents | 15 Seconds to Signup

Hi !

Welcome to Docket Alarm

Welcome to Docket Alarm!

Explore Litigation Insights andManage Your Cases

Reset Password

What is PACER?

Why do I need it?

What will I be charged?

Do other courts have fees?

Basic Free Access

Welcome

Thank you

Check Firm Account

We are redirecting you
to a mobile optimized page.

Explore Litigation Insights and
Manage Your Cases