`
`On-line Data Compression
`in a Log-structured File System
`
`Michael Burrows, Charles Jerian, Butler Lampson,
`Timothy Mann
`
`April 15, 1992
`
`APPLE 1007
`
`1
`
`
`
`Systems Research Center
`
`DEC’s business and technology objectives require a strong research program.
`The Systems Research Center (SRC) and three other research laboratories are
`committed to filling that need.
`
`SRC began recruiting its first research scientists in l984—their charter, to advance
`the state of knowledge in all aspects of computer systems research. Our current
`work includes exploring high-performance personal computing, distributed com-
`puting, programming environments, system modelling techniques, specification
`technology, and tightly-coupled multiprocessors.
`
`Our approach to both hardware and software research is to create and use real
`systems so that we can investigate their properties fully. Complex systems cannot
`be evaluated solely in the abstract. Based on this belief, our strategy is to demon-
`strate the technical and practical feasibility of our ideas by building prototypes and
`using them as daily tools. The experience we gain is useful in the short term in
`enabling us to refine our designs, and invaluable in the long term in helping us to
`advance the state of knowledge about those systems. Most of the major advances
`in information systems have come through this strategy, including time-sharing,
`the ArpaNet, and distributed personal computing.
`
`SRC also performs work of a more mathematical flavor which complements our
`systems research. Some of this work is in established fields of theoretical computer
`science, such as the analysis of algorithms, computational geometry, and logics of
`programming. The rest of this work explores new ground motivated by problems
`that arise in our systems research.
`
`DEC has a strong commitment to communicating the results and experience gained
`through pursuing these activities. The Company values the improved understand-
`ing that comes with exposing and testing our ideas within the research community.
`SRC will therefore report results in conferences, in professional journals, and in
`our research report series. We will seek users for our prototype systems among
`those with whom we have common research interests, and we will encourage
`collaboration with university researchers.
`
`Robert W. Taylor, Director
`
`2
`
`
`
`On-line Data Compression
`in a Log-structured File System
`
`Michael Burrows
`Charles Jerian
`Butler Lampson
`Timothy Mann
`
`April 15, 1992
`
`iii
`
`3
`
`
`
`This report appeared in the proceedings of the Fifth International Conference
`on Architechural Support for Programming Languages and Operating Systems
`(ASPLOS-V), 12-15 October, 1992, published by ACM Press.
`
`c Digital Equipment Corporation 1992
`
`This work may not be copied or reproduced in whole or in part for any commercial
`purpose. Permission to copy in part without payment of fee is granted for nonprofit
`educational and research purposes provided that all such whole or partial copies
`include the following: a notice that such copying is by permission of the Systems
`Research Center of Digital Equipment Corporation in Palo Alto, California; an
`acknowledgment of the authors and individual contributors to the work; and all
`applicable portions of the copyright notice. Copying, reproducing, or republishing
`for any other purpose shall require a license with payment of fee to the Systems
`Research Center. All rights reserved.
`
`iv
`
`4
`
`
`
`Abstract
`
`We have incorporated on-line data compression into the low levels of
`a log-structured file system (Rosenblum’s Sprite LFS). Each block of data
`or meta-data is compressed as it is written to the disk and decompressed
`as it is read. The log-structuring overcomes the problems of allocation and
`fragmentation for variable-sized blocks. We observe compression factors
`ranging from 1.6 to 2.2, using algorithms running from 1.7 to 0.4 MBytes
`per second in software on a DECstation 5000/200. System performance is
`degraded by a few percent for normal activities (such as compiling or editing),
`and as much as a factor of 1.6 for file system intensive operations (such as
`copying multi-megabyte files).
`Hardware compression devices mesh well with this design. Chips are
`already available that operate at speeds exceeding disk transfer rates, which
`indicates that hardware compression would not only remove the performance
`degradation we observed, but might well increase the effective disk transfer
`rate beyond that obtainable from a system without compression.
`
`v
`
`5
`
`
`
`Contents
`
`1
`
`Introduction
`
`2 Sprite LFS
`
`3 Adding compression to LFS
`3.1 Logical and physical disk space : : : : : : : : : : : : : : : : : :
`3.2 Reading a logical block : : : : : : : : : : : : : : : : : : : : : :
`3.3 Writing to the log : : : : : : : : : : : : : : : : : : : : : : : : :
`3.4 Crash recovery : : : : : : : : : : : : : : : : : : : : : : : : : : :
`3.5 Compression block size : : : : : : : : : : : : : : : : : : : : : :
`3.6 Free space : : : : : : : : : : : : : : : : : : : : : : : : : : : : :
`3.7 File-specific compression : : : : : : : : : : : : : : : : : : : : :
`
`4 Compression algorithms
`
`5 LFS and compression hardware
`5.1 System changes
`: : : : : : : : : : : : : : : : : : : : : : : : : :
`5.2 Compression hardware : : : : : : : : : : : : : : : : : : : : : : :
`
`6 Performance of prototype
`
`7 Comparison with other work
`
`8 Summary
`
`Acknowledgements
`
`References
`
`1
`
`3
`
`4
`4
`4
`6
`8
`9
`9
`10
`
`10
`
`12
`12
`13
`
`14
`
`16
`
`18
`
`18
`
`20
`
`vi
`
`6
`
`
`
`1 Introduction
`
`Building a file system that compresses the data it stores on disk is clearly an
`attractive idea. First, more data would fit on the disk. Also, if a fast hardware
`data compressor could be put into the data path to disk, it would increase the
`effective disk transfer rate by the compression factor, thus speeding up the system.
`Yet on-line data compression is seldom used in conventional file systems, for two
`reasons.
`First, compression algorithms do not provide uniform compression on all data.
`When a file block is overwritten, the new data may be compressed by a different
`amount from the data it supersedes. Therefore the file system cannot simply
`overwrite the original blocks—if the new data is larger than the old, it must be
`written to a place where there is more room; if it is smaller, the file system must
`either find some use for the freed space or see it go to waste. In either case, disk
`space tends to become fragmented, which reduces the effective compression.
`
`compression ratio
`input block size
`(output size/input size)
`(bytes)
`68%
`1K
`63%
`2K
`59%
`4K
`55%
`8K
`53%
`16K
`51%
`32K
`The file progc from the Calgary Compression Corpus [3] was compressed using various
`block sizes. The file contains 39611 bytes of C source. The entire file was compressed,
`one block at a time. The compression algorithm is described below in Section 4 as
`Algorithm 2.
`
`Table 1: An example of improved compression with increased block size.
`
`Second, the best compression algorithms are adaptive—they use patterns dis-
`covered in one part of a block to do a better job of compressing information in
`other parts [3]. These algorithms work better on large blocks of data than on small
`blocks. Table 1 shows the variation in compression ratio with block size for a
`simple adaptive compression algorithm. The details vary for different compres-
`sion algorithms and different data, but the overall trend is the same—larger blocks
`make for better compression.
`
`1
`
`7
`
`
`
`
`
`AAAAAA
`
`
`AAAAAA
`
`
`AAAAAA
`AA
`AA
`AAAAAA
`
`AA
`AA
`AA
`Figure 1: Simplified view of LFS’s log.
`
`file block
`
`inode block
`
`unused space
`
`inode map block
`
`However, it is difficult to arrange for sufficiently large blocks of data to be
`compressed all at once. Most file systems use block sizes that are too small for
`good compression, and increasing the block size would waste disk space in frag-
`mentation. Compressing multiple blocks at a time seems difficult to do efficiently,
`since adjacent blocks are often not written at the same time. Compressing whole
`files would also be less than ideal, since in many systems most files are only a few
`kilobytes [2].
`In a log-structured file system, the main data structure on disk is a sequentially
`written log. All new data, including modifications to existing files, is written to
`the end of the log. This technique has been demonstrated by Rosenblum and
`Ousterhout in a system called Sprite LFS [9]. The main goal of LFS was to provide
`improved performance by eliminating disk seeks on writes. In addition, LFS is
`ideally suited for adding compression—we simply compress the log as it is written
`to disk. No blocks are overwritten, so we do not have to make special arrangements
`when new data does not compress as well as existing data. Because blocks are
`written sequentially, the compressed blocks can be packed tightly together on disk,
`eliminating fragmentation. Better still, we can choose to compress blocks of any
`size we like, and if many related small files are created at the same time, they
`will be compressed together, so any similarities between the files will lead to better
`compression. We do, however, need additional bookkeeping to keep track of where
`compressed blocks fall on the disk.
`The remainder of this paper describes the relevant features of Sprite LFS,
`the changes needed to introduce compression, and the performance of our modi-
`fied system using simple software compression routines. We argue that suitable
`hardware compression devices can be readily obtained or constructed.
`
`2
`
`8
`
`
`
`2 Sprite LFS
`
`Sprite LFS places all file data and almost all meta-data (for example, directory
`information) in a sequentially written log. The file system data structures allow
`any block of an existing file to be located quickly in the log. For a full description
`see Rosenblum and Ousterhout’s paper [9].
`The log is stored in a chain of fixed-size segments, each of which is a contiguous
`array of disk sectors. The system keeps track of the current segment, which is the
`segment containing the current end of the log. When the current segment becomes
`full, the system picks an unused segment, links it onto the end of the chain, and
`makes it the current segment.
`As files are deleted and rewritten, some of the data in the log becomes obsolete.
`The space occupied by this data is reclaimed by the segment cleaner, which is
`analogous to a copying garbage collector in a programming language run-time
`system. The cleaner chooses segments and scans them to determine which blocks
`are still in use, appending them to the log in the usual manner. The cleaned
`segments are then unlinked from the log and made available for reuse.
`Some of the blocks written to the log contain file data, while others contain
`meta-data of various kinds, and the segment cleaner must be able to tell which is
`which. To do this, it uses summary blocks, which also appear in the log. These
`blocks contain a small amount of identifying information for each item written to
`the log. At least one summary block is written for each segment, and additional
`blocks are used if the first is not large enough to describe all the data being written.
`The summary blocks in a segment are linked together to form a list.
`Inodes are an important type of meta-data. Like a UNIX1 inode, a Sprite
`LFS inode contains information about its associated file, including pointers to data
`blocks contained in the file. Whenever a file is modified, an updated copy of its
`inode is placed in the log. To avoid fragmentation, LFS places several inodes
`together in one block when it can. LFS maintains an inode map, which allows the
`current copy of any inode to be found quickly. The inode map is also written to
`the log.
`When LFS writes data to the log, it first builds a list of elements. Each element
`represents a number of disk blocks, and the order of the elements gives the order
`in which the blocks will appear on disk. Each element is a logical unit, such as a
`summary block, a block of inodes, or file data. It is only when an element is added
`to the list that the system can tell where the data in the element will fall on the disk.
`Periodically, LFS writes a checkpoint, by flushing the current state of in-
`
`1UNIX is a registered trademark of UNIX System Laboratories, Inc.
`
`3
`
`9
`
`
`
`memory data structures to the log and writing a pointer to the end of the log in a
`fixed location on disk. When recovering from crashes, LFS restores the file system
`state as of the last checkpoint, and then rolls forward by following the chain of log
`segments written subsequently.
`
`3 Adding compression to LFS
`
`3.1 Logical and physical disk space
`
`As with the original Sprite LFS, our modified system divides the physical disk
`into fixed-sized segments chained together to form a log. However, because of
`compression, each segment can contain more user data than its physical size might
`suggest. Accordingly, we allocate space within the segment in two ways: logical
`and physical. The physical space represents real disk blocks, while the logical
`space represents the space that would have been occupied by the data if it were not
`compressed.
`As each kilobyte of data is added to a segment, it occupies a kilobyte of logical
`space, but may occupy less physical space. A segment is declared full when either
`its physical or its logical space is exhausted. (If logical space becomes full first,
`some physical space is wasted. So logical space should be larger than physical
`space by a factor greater than the maximum compression expected.)
`Logical space is subdivided into compression blocks, which are the units of
`compression and decompression. A logical disk address consists of a segment
`number, a compression block number within the segment, and a sector number
`within the compression block. We chose a physical segment size of 512 KBytes
`and a maximum compression factor of four, so our logical segment size was 2
`MBytes. Our compression blocks are 16 KBytes, and our sector size is 512 bytes.
`Thus, our logical disk addresses fit easily in a 32-bit word:
`
`segment (20 bits)
`
`compression block (7 bits)
`
`sector (5 bits)
`
`3.2 Reading a logical block
`
`Most of the modified file system deals exclusively in logical disk addresses. For
`example, the disk addresses used by the file system cache are logical, and so are the
`disk addresses placed in inodes. Physical addresses are needed only by modules at
`the lowest level of the file system.
`
`4
`
`10
`
`
`
`16 KByte compression block
`
`2 MByte logical segment
`
`unused logical space
`
`logical block map
`
`unused physical space
`
`512 KByte physical segment
`
`Figure 2: Logical and physical views of a segment.
`
`The module that reads a disk block given its logical address needs a way to
`find the physical address of the compressed bytes. We keep a logical block map
`for each segment, which is simply an array indexed by compression block number,
`whose entries are the physical byte addresses of the blocks relative to the start of
`the segment. The block map is constructed in memory as the segment is being
`compressed, and written to the end of the segment when the segment is full. The
`maps are needed for all file reads, so they are cached in memory whenever possible.
`Because our logical segments contain 128 compression blocks and our physical
`segments are 512 KBytes, our block maps each contain 128 four-byte entries,
`which exactly fills one 512-byte disk sector. (The entries could be reduced to two
`bytes each by restricting compression blocks to begin on eight byte boundaries
`within the segment, but we did not implement this optimization in our prototype.)
`The procedure for finding the compressed data associated with a logical address is
`as follows:
`
`1. Extract the segment number from the logical address. Use it to find the
`logical block map for the segment.
`
`2. Extract the compression block number from the address. Use it to index the
`logical block map. This yields the physical byte offset of the compressed
`data within the segment.
`
`5
`
`11
`
`
`
`3. Examine the next entry in the map, to find the start of the next block. This
`determines how much data should be read from the disk.
`
`4. Read the compressed data from the disk and decompress it.
`
`5. Extract the sector number from the logical address. Use it to identify the
`desired sector within the decompressed block.
`
`Unfortunately, this procedure reads and decompresses a full compression block
`even if the caller wanted only some smaller unit, such as a file system block or
`a physical sector. We alleviate this problem by caching the entire decompressed
`block in memory, rather than just caching the requested sectors. The data could
`be placed in the file system buffer cache, but for simplicity in our prototype, we
`cached the last decompressed block within the read routine. Sprite LFS reads files
`sequentially in 4 KByte units, so this simple caching strategy typically achieves
`three hits for each 16 KByte compression block when reading large files.
`When the file system is reading non-sequentially, the additional time to read a
`full compression block cannot be hidden by caching. Fortunately, this time is small
`compared to the rotational latency. The time needed to decompress the full block
`in software is several milliseconds; it would be much smaller if decompression
`were implemented in hardware.
`
`3.3 Writing to the log
`
`Conceptually, it is simple to write compressed data to the log. As LFS constructs its
`element list, it can compress the data into a buffer, one compression block at a time.
`When there is no more data to add to the list, or when logical or physical segment
`space is exhausted, it writes the buffer to disk and updates the logical block map
`for the segment. In practice, however, the design was more complicated.
`In Sprite LFS, the contents of certain kinds of element, such as inode blocks,
`are allowed to change even after they are added to the element list; we will call
`these variable elements. For example, when LFS puts the first inode in a segment,
`it allocates an empty inode block and places it on the element list. When more
`inodes are added to the segment, they are placed in this inode block. Eventually,
`it may become full, at which point another block is allocated for inodes. Thus,
`an inode block may have inodes added to it long after it has been placed on the
`element list. A similar strategy is used for summary blocks—this technique allows
`LFS to make better use of disk space when allocating small data structures.
`In an unmodified LFS, these variable elements present no difficulty; the con-
`tents of elements can be changed freely until the segment is written to disk, as long
`
`6
`
`12
`
`
`
`as their size does not change. However, in our system, the data for each element
`must be known before the element can be compressed, and compression must take
`place early enough to decide which elements fit into the current segment. We know
`of two ways to accommodate variable elements in a compressed LFS.
`One technique is to delay compressing variable elements until just before the
`segment is written to disk. Enough physical space must be reserved to hold the
`variable elements, assuming the worst-case compression ratio (i.e. that they cannot
`be compressed at all). With this approach, there is little benefit in compressing the
`variable elements, since the space saved cannot easily be used.
`A second technique is to delay compressing all elements as long as possible,
`allowing variable elements to be changed freely until they are compressed, but
`not afterwards. We do not compress elements as they are added; instead, we
`reserve enough physical space for each element to accommodate the worst-case
`compression ratio, until we can no longer be sure the next element will fit into
`the physical segment. Then we compress all the elements, correct the amount of
`physical space remaining to reflect the actual compression achieved, and mark the
`elements as no longer variable. When an inode block is compressed, for example,
`it can be marked as full so that no further inodes will be added to it. A new block
`will automatically be allocated if needed.
`The following pseudo-code illustrates the action taken for each element elem,
`containing size(elem) bytes. The worst case function returns the largest
`number of bytes that may be needed to represent the data when compressed.
`
`if (size(elem) > logical_space_left) {
`goto segment_full
`
`} i
`
`f (worst_case(elem) > phys_space_left) {
`compress all elements not yet compressed
`increase phys_space_left by bytes saved
`mark inode blocks full
`if (worst_case(elem) > phys_space_left) {
`goto segment_full
`
`}
`
`}a
`
`dd elem to element list, without compression
`reduce logical_space_left by size(elem)
`
`This approach is less general than the first, since it works only if the code
`that modifies variable elements can be told to stop doing so at an arbitrary point;
`however, it does allow variable elements to be compressed.
`Both the summary blocks and inode blocks of Sprite LFS could be handled
`using either technique. As a matter of expediency, we used the first technique for
`
`7
`
`13
`
`
`
`summary blocks and the second for inode blocks, and we chose not to compress
`summary blocks. These choices worked out fairly well: there are few summary
`blocks per segment, so little compression is lost, and crash recovery is not adversely
`affected (see Section 3.4). Inode blocks tend to be less full than in unmodified
`LFS—two or three blocks might be allocated per segment where only one would
`have been used before—but little physical space is wasted because a partially-
`empty block compresses exceedingly well. However, programs that read many
`inodes (such as the file tree walker find) do not perform as well; see Section 6
`below for details.
`A third technique for dealing with variable elements is to eliminate them. We
`could have modified Sprite LFS to fill summary blocks and inode blocks completely
`before adding them to the element list. This would have been difficult, however,
`because the logical disk address of a data block is not known until it is added to
`the element list, and the current LFS implementation needs to know the addresses
`of summary blocks and inode blocks as it fills them in, for use as pointers in other
`data structures. Also, LFS needs to know that the summary block it is filling will
`fit into the current segment, not be spilled over into the next one. Therefore we
`chose not to take this approach in our prototype.
`
`3.4 Crash recovery
`
`Our changes introduce a new problem in crash recovery. The system could crash
`after a segment has been written, but before the logical block map has been
`committed to disk. One might view this as an argument for writing the logical
`block map first, at the start of the segment. However, LFS often fills a segment
`using a small number of partial writes—these are used when it is necessary to
`commit data to disk, but the data does not fill an entire segment. So, on successive
`partial writes the segment’s block map would be updated in place, and could
`therefore be lost in a crash. Alternatively, we could allocate a new block map for
`each partial segment written, but this would consume additional disk space and
`complicate the process of reading and parsing the maps.
`Fortunately, this problem is easy to solve. We maintain the invariant that all
`segments on the disk have a valid logical block map on disk, except possibly the
`current segment. When the file system examines the disk after a crash, it must first
`read the current segment, decompress it from the beginning, and reconstruct the
`logical block map. This action is similar to rolling forward from a checkpoint.
`We do not compress the checkpoint areas or the summary blocks. Together,
`these blocks contain enough information for LFS to find the end of the log during
`recovery: we modified the segment cleaner to read segments without using the
`
`8
`
`14
`
`
`
`block map in order to exercise this algorithm.
`
`3.5 Compression block size
`
`We would like to compress data in large blocks, but there are reasons for choosing
`smaller blocks.
`First, when reading bytes from the disk, decompression must start from the
`beginning of a compressed block. Thus, the mean latency for reading a randomly
`chosen byte on the disk is increased by the time to read the block from the disk
`and decompress it. A 16 KByte block seems to be a reasonable choice, since it is
`large enough to allow the compression algorithm to obtain most of the compression
`available, but small enough that it adds only a few milliseconds to the transfer time.
`In software on a DECstation 5000/2002, the decompression time is 9 ms for the
`fastest algorithm we tried; in hardware, decompression would be faster than the
`disk transfer, and could be overlapped with it more easily.
`A second issue is that applications often commit small amounts of data to
`disk, resulting in poor compression. The obvious way to overcome this problem is
`to employ a small amount of non-volatile memory to postpone compression until
`more data is available. Alternatively, one can ignore the problem and write the data
`immediately, because the segment cleaner will later combine the data from adjacent
`writes to form a full-sized compression block. This is similar to the strategy used
`in unmodified LFS to recover unused space left in inode and summary blocks after
`small amounts of data have been committed.
`The cleaner saves space in other minor ways too. In our system, when two
`compression blocks are written together, no gap need be left between them. But
`when two blocks are forced to disk independently, the second must start on a sector
`boundary to avoid the danger of damaging previously written data. This causes
`fragmentation that is removed later by the segment cleaner.
`
`3.6 Free space
`
`One problem with using compression in a file system is that it leaves the naive user
`open to surprises. For one thing, it is no longer obvious how to report the amount
`of space remaining on the disk! It seems best to report the amount of data that
`could be put on the disk assuming worst-case compression (i.e. no compression at
`all), so that the user is more likely to be agreeably surprised than upset.
`A more worrying problem is that actions that do not consume disk space in
`conventional file systems may do so when compression is used. For example, if a
`
`2DECstation is a trademark of Digital Equipment Corporation
`
`9
`
`15
`
`
`
`block in a file is overwritten, the new data may not compress as much as the old
`data. When compression is used, such operations cannot be allowed when the disk
`is full.
`
`3.7 File-specific compression
`
`Different compression algorithms are better suited to different data, and so one
`might like to choose the algorithm to be used for each file. One way to do this in
`our system would be to place some additional hints in each inode. The LFS module
`that places file data blocks into the element list could use the hints to mark elements
`for compression by different algorithms. One possible use of such a scheme would
`be to indicate that certain files should not be compressed at all; this would free
`those files from the restriction noted in the previous subsection.
`While file-specific compression could easily be applied to our current software
`implementation, it would certainly complicate the hardware design described in
`Section 5. Moreover, only large files would benefit from this approach, since only
`one compression algorithm would be used for each 16 KBytes of data.
`
`4 Compression algorithms
`
`A wide variety of compression algorithms could be used for this application,
`but fast, simple algorithms are clearly preferable. One suitable algorithm was
`introduced by David Wheeler in his red and exp programs [13]; similar algorithms
`were also described by Raita and Teuhola [8]. Williams has developed algorithms
`that are particularly fast for decompression [15, 16].
`The simplest form of Wheeler’s algorithm (which we will call Algorithm 1) is
`as follows. A hash table t, whose entries are bytes, is initialized to a known state,
`such as all zeroes. For each input byte c, the compressor constructs a hash value
`h based on the preceding three bytes. The value found in table entry th is called
`the predicted character. If th = c, the compressor outputs a token indicating
`a correct prediction. If th = c, the compressor sets the entry th to be c, and
`outputs a token indicating that prediction failed and that the input character was c.
`If this algorithm expands the block, the compressor outputs a token indicating that
`no compression has occurred, followed by a copy of the original data.
`The decompressor maintains a similar hash table t. For each token read from
`the compressor, the decompressor constructs a hash value h based on the last three
`bytes written to its output. The hash function and initial setting of t must be identical
`in compressor and decompressor. If the token indicates a successful prediction,
`
`10
`
`16
`
`
`
`the decompressor reads and outputs the byte th. If it indicates prediction failure
`with the input character c, the decompressor sets the entry th to c and outputs c.
`In our tests, using Sprite system binaries and an operating system source tree,
`this algorithm predicts around 50% of the bytes in a typical 16 KByte block and
`compresses it down to 10 KBytes. We used a modest hash table size (n = 4096
`entries) and a simple hash function (h = 256c3 16c2 c1 mod n, where ci is
`the ith character preceding the current character c, and is bitwise exclusive-or),
`and we chose to encode a correct prediction with one bit and a missed prediction
`with nine bits.
`An improved version of the algorithm (which we will call Algorithm 2) adds a
`second hash table holding an alternative prediction, and a token to indicate a correct
`prediction in the second table. In addition, it applies Huffman coding to the stream
`of predictions and misses, instead of using a fixed-length code for misses. Our
`implementation uses a static Huffman code computed from frequencies of bytes
`seen on a typical UNIX disk.
`Besides the algorithms based on Wheeler’s ideas, we tried the LZRW1-A and
`LZRW3-A algorithms due to Williams. A full description and implementation are
`available elsewhere [15, 16], so we omit the details here.
`
`compression
`Compressor Compression speed Decompression Speed
`ratio
`KBytes/s
`KBytes/s
`61%
`1700
`2200
`Algorithm 1
`51%
`590
`670
`Algorithm 2
`52%
`910
`2400
`LZRW1-A
`48%
`360
`1500
`LZRW3-A
`50%
`250
`410
`compress
`36%
`45
`390
`zoo -h
`This table shows the performance of six compression algorithms on 240 MBytes of
`data from a log-structured file system containing Sprite operating system source and
`binaries. The figures give the compression speed, decompression speed, and the ratio
`of output size to input size (compression ratio). Compression was performed on 16
`KByte blocks. Compression speed is in kilobytes of uncompressed data processed
`per second of CPU time, given to two significant figures. Times were measured on a
`DECstation 5000/200.
`
`Table 2: Comparison of compression algorithms.
`
`Table 2 illustrates the performance of these algorithms on data typical for a
`UNIX program development environment. For comparison, we include figures for
`
`11
`
`17
`
`
`
`the popular compress utility, which uses the LZC algorithm, and the zoo archiver,
`which uses the LZSS algorithm. The table shows that the algorithms we chose are
`quite fast, but better compression could be obtained by sacrificing speed.
`Bell, Witten, and Cleary give a more thorough comparison of compression
`algorithms and their effectiveness on different sorts of data [3]. They also describe
`the LZC and LZSS algorithms.
`
`5 LFS and compression hardware
`
`We have arranged our system so that the compression algorithm can be changed
`easily. An obvious improvement would be to replace the compression routine
`with a piece of hardware. As noted in the introduction, this would reduce the
`performance penalty exacted by software compression, and would increase the
`effective disk transfer rate. We have not yet integrated compression hardware into
`our prototype, but in this section we discuss how it might best be done.
`
`5.1 System changes
`
`The simplest way to add hardware compression to our design would be to build a
`DMA device that reads uncompressed data from one buffer and writes compressed
`data to another (and vice versa), and use it as a direct replacement for our software
`compression module.
`Consider a disk with a transfer rate of d MBytes per second, and a compressor
`that achieves compression ratio r at a rate of c MBytes per second. Without
`compression, the time to transfer n MBytes is n=d seconds. With a separate
`(Seek and rotational
`hardware compressor, this becomes n=c + nr=d seconds.
`delays are unaffected. We assume that DMA setup times are negligible for blocks
`of more than a few kilobytes.) We would like to reduce the total time, which
`implies that c d=1 r. For example, when r = 0:5, d = 2 MBytes per second,
`c must exceed 4