`Chu
`
`[54]
`
`IDGH SPEED LOSSLESS DATA
`COMPRESSION SYSTEM
`
`[75]
`
`Inventor: Ke-Chiang Chu, Saratoga, Calif.
`
`[73] Assignee: Apple Computer, Inc., Cupertino,
`Calif.
`
`[21] Appl. No.: 992,972
`
`Dec. 18, 1992
`
`[22] Filed:
`Int. Cl.6
`..........................•........................... H03M 7/30
`[51]
`[52] U.S. CI . ................................................. 341/51; 341/86
`[58] Field of Search .................................. 341/51, 65, 67,
`341187, 95, 106, 107
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,394,352
`4,558,302
`5,010,345
`5,045,852
`5,184,126
`5,220,417
`5,247,638
`
`711968 Wemikoff ................................. 341151
`12/1985 Welch ..................................... 340/347
`4/1991 Nagy ......................................... 341/65
`9/1991 Mitchell et al ........................... 341/51
`2/1993 Nagy ......................................... 341167
`6/1993 Sugiura ................................... 358/515
`9/1993 O'Brien .................................... 341187
`
`FOREIGN PATENT DOCUMENTS
`
`2-262766 10/1990
`
`Japan ............................... H04N 1/41
`
`OTHER PUBLICATIONS
`
`IEEE, Computer, Jun. 1984, pp. 8-19, Terry A. Welch,
`Sperry
`Research
`Center,
`"A
`Technique
`for
`High-Performance Data Compression".
`IEEE, Transactions On Information Theory, vol. II 23, No.
`3, May 1977, pp. 337-343 J. Ziv, A. Lempel, "A Universal
`Algorithm for Sequential Data Compression".
`IEEE, Transactions On Information Theory, vol. II 24, No.
`
`I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111
`5,467,087
`Nov. 14, 1995
`
`US005467087 A
`[11] Patent Number:
`[45J Date of Patent:
`
`5, Sep. 1978, pp. 530--536, J. Ziv, A. Lempel, "Compression
`of Individual Sequences via Variable-Rate Coding".
`IEEE Transactions on Information Theory vol. II, 21, No. 2,
`Mar. 1975, pp. 194-203. Peter Elias, "Universal Codeword
`Sets and Representations of the Integers".
`Communications of the ACM, Apr. 1989 Vol. 32 No. 4, pp.
`490--504, Edward R. Riala and Daniel H. Greene, "Data
`Compression with Finite Windows".
`Timothy C. Bell, John G. Cleary, Ian H. Witten, Text
`Compression, Prentice Hall, Englewood Cliffs, New Jersey,
`1990, pp. 206-243.
`System Enhancement Associates, ARC File Archive Utility
`Version 5.1, copyright 1985, 1986, p. 2.
`
`Primary Examiner-Howard L. Williams
`Attorney, Agent, or Firm-Irene Y. Hu
`
`[57]
`
`ABSTRACT
`
`A data compression process and system that identifies the
`data type of an input data stream and then selects in response
`to the identified data type at least one data compression
`method from a set of data compression methods that pro(cid:173)
`vides an optimal compression ratio for that particular data
`type, thus maximizing the compression ratio for that input
`data stream. Moreover, the data compression process also
`provides means to alter the rate of compression during data
`compression for added flexibility and data compression
`efficiency. Furthermore, a system memory allocation process
`is also provided to allow system or user control over the
`amount of system memory to be allocated for the memory
`intensive data compression process. System memory allo(cid:173)
`cation process estimates the memory requirement to com(cid:173)
`press the input data stream, and allocates only that amount
`of system memory as needed by the data compression for
`memory allocation efficiency.
`
`34 Claims, 8 Drawing Sheets
`
`-,
`
`101
`
`I -------(------
`~
`IDENTIFY INPUT DATA
`TYPE AND GENERATE
`DATA 1YPE ID SIGNAL
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`1~--1---~
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`jJlli
`
`1
`
`COMPRESS DATA
`IN ACCORDANCE
`WITH IDENTIFIED
`DATA TYPE
`
`10ll
`
`1Q1.
`
`I : _______________ ~,
`
`fill
`
`NetApp; Rackspace Exhibit 1013 Page 1
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 1of8
`
`5,467,087
`
`_V)l_a_b_hd_~_fg-~-·-----a-bc_d_··_· ________ __,~
`I <
`Po
`> hto:>-1
`16 f
`15!
`
`. · l
`
`I
`
`12
`22
`,-A-v-"-.
`
`-L-LO_a_bcd-LL-1 -e-fg -·-· • .... L-Ln....., r · +OL1
`
`/20
`I
`
`24
`,-A-.
`
`FIG. I (PRIORART)
`
`~~abed···
`
`r1
`
`17
`
`FIG. 2 (PRIORART)
`
`NetApp; Rackspace Exhibit 1013 Page 2
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 2 of 8
`
`5,467,087
`
`~I
`
`~I
`
`i.n
`h
`-.:t"
`"---._
`
`C) z
`z
`0
`Cl
`Ci5
`0
`(.) ~en
`z
`<Cw
`<( off:
`~ ~
`u. u.
`0
`(.)
`:;:)
`:r:
`
`' l
`
`~
`CtJ
`-.:t"
`
`C)
`z
`Cl
`0
`(.)
`~
`
`z
`0
`en
`<(Cf)
`1-W
`<Ca:
`Cle....
`~
`0
`(.)
`
`.h
`
`~
`
`0
`
`i='
`a:
`<(
`a:
`0
`a:
`
`a.. -
`
`NetApp; Rackspace Exhibit 1013 Page 3
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 3 of 8
`
`5,467,087
`
`r - - --------------,
`
`101
`~
`
`1 '
`IDENTIFY INPUT DATA
`TYPE AND GENERATE
`DATA TYPE ID SIGNAL
`
`.1Q6.
`
`1f
`
`COMPRESS DATA
`IN ACCORDANCE
`WITH IDENTIFIED
`DATA TYPE
`
`100.
`
`.1.Q2.
`
`107
`
`1r
`RETRIEVE DATA
`TYPE INFORMATION
`OF COMPRESSED
`DATA STREAM
`
`.1ill
`
`B
`
`1r
`DECOMPRESS DATA
`IN ACCORDANCE
`WITH IDENTIFIED
`DATA TYPE
`
`112
`
`1M
`
`FIG. 4
`
`NetApp; Rackspace Exhibit 1013 Page 4
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 4 of 8
`
`5,467,087
`
`J 101
`..----------------,
`1r
`IDENTIFY INPUT DATA
`TYPE
`.11A
`
`+
`
`SELECTPMAX
`AND LMAX
`
`.115
`
`+
`
`GENERATE DATA TYPE
`ID SIGNAL AND
`PROVIDE SIGNAL TO
`COMPRESSION PHASE
`.ill.
`
`10.Q
`
`A
`
`.1Q8
`
`11r
`
`SELECT AT LEAST ONE
`DATA COMPRESSION
`METHOD FROM A SET
`OF DATA COMPRESSION
`METHODS ACCORDING
`TO INPUT DATA TYPE
`SIGNAL AND CONTROL
`RATE SIGNAL
`
`11.a
`
`1J
`
`COMPRESS INPUT
`DATA STREAM
`USING SELECTED
`COMPRESSION METHOD
`.120.
`
`FIG.
`
`NetApp; Rackspace Exhibit 1013 Page 5
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 5 of 8
`
`5,467,087
`
`A
`
`1----------------,
`I
`I
`I
`I
`I
`I
`I
`I
`I
`
`, '
`ESTIMATE
`MEMORY
`REQUIREMENT
`
`,,
`ALLOCATE
`MEMORY
`FOR DATA
`COMPRESSION
`
`14.Q
`
`1 r
`
`COMPRESS DATA
`USING A SELECTED
`LZ-TYPE COMPRESSION
`METHOD
`
`122.
`
`'r
`
`COMPRESS DATA
`USING A SELECTED
`HUFFMAN CODE
`METHOD
`
`ill.
`
`103
`
`MONITOR
`- - - COMPRESSION RATE
`CONTROL SIGNAL
`126.
`
`INCREASE THE SPEED
`OF COMPRESSION BY
`SWITCHING LZ METHOD OR
`DECREASE PMAX LMAX
`~
`
`FIG. 6
`
`NetApp; Rackspace Exhibit 1013 Page 6
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 6 of 8
`
`5,467,087
`
`B
`
`1 r
`
`SELECT A HUFFMAN
`LOOKUP TABLE
`ASSOCIATED WITH THE
`IDENTIFIED DATA TYPE
`132
`
`1 r
`
`DECOMPRESS DATA
`USING SELECTED
`LOOKUP TABLE TO
`GENERATE A SET OF
`(p' L')
`.1M
`
`'
`DECOMPRESS THE
`p' L' TO GENERATE
`EXPANDED DATA
`
`~
`
`ill.
`
`FIG. 7
`
`NetApp; Rackspace Exhibit 1013 Page 7
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 7 of 8
`
`5,467,087
`
`~I
`
`~I
`
`z
`0
`Cf)~
`<(Cf) w
`I-WI(cid:173)
`<(a: Cl)
`Cl c... >-
`~Cf)
`0
`0
`
`~
`LO
`C>
`T"""
`
`---
`
`z
`0
`U5
`Cf)
`w~
`~a:w
`<( C...1-
`
`Cl~ Cf) o>-ocn
`I w
`a:
`c...
`
`T"""
`0
`
`NetApp; Rackspace Exhibit 1013 Page 8
`
`
`
`U.S. Patent
`
`Nov. 14, 1995
`
`Sheet 8 of 8
`
`5,467,087
`
`gl
`
`r--"
`0)
`C>
`T"""
`
`z
`0
`U5
`U) w
`<a:~
`t-a...w
`<~I-
`oOoo o>-
`
`WU)
`Cl
`
`I w a:
`a...
`
`-
`
`~
`
`NetApp; Rackspace Exhibit 1013 Page 9
`
`
`
`5,467,087
`
`1
`ffiGH SPEED LOSSLESS DATA
`COMPRESSION SYSTEM
`
`This patent application relates to copending and concur(cid:173)
`rently filed patent application having the following patent
`application serial number and filing date: Ser. No. 07/993,
`181, filed Dec. 18, 1992. This patent application and this
`copending patent application are commonly owned at the
`time of filing of this patent application.
`
`FIELD OF INVENTION
`
`This invention relates ·to electronic data manipulation
`processes. More specifically, this invention relates to elec(cid:173)
`tronic data compression systems used with a computer.
`
`BACKGROUND
`
`5
`
`2
`("FfP"), the FfP process queries the user to supply the data
`type information of the input data stream. Other prior art
`data manipulation processes include requiring a user to set
`a data type mode bit, or to assume a particular data type of
`the input data stream. Assuming a particular data type is an
`inefficient method of manipulating data. If an electronic data
`manipulation process always assumes the data type to be 8
`bits, when in reality the input data type comprise 7 bits, the
`data type assumption by the process then results in a
`10 substantial waste of system memory to reserve an additional
`bit for each data symbol in the input data stream. Thus, it
`would be desirable to provide a method to automatically
`detect the data type of an input data stream.
`Additionally, typical prior art data compression tech-
`15 niques are classified either as a statistical or a dictionary type
`of data compression method. A statistical type of data
`compression is based on single symbol coding. Single
`symbol coding is accomplished by assigning to each pos(cid:173)
`sible data symbol in the input data stream a probability for
`the appearance of that symbol. Examples of this type of data
`20 compression method are the Huffman code method and the
`widely published variations of this code. With the Huffman
`coding method, a symbol having a greater probability of
`appearance is encoded with a short binary string, while a
`symbol having a lower probability of appearance in the input
`25 data stream is encoded with a longer binary string.
`A dictionary type data compression method associates
`groups of consecutive characters, as in phrases, to a dictio(cid:173)
`nary of indices. The dictionary type data compression meth(cid:173)
`ods are also commonly referred to as a "codebook" or a
`"macro coding" approach. The various coding schemes in
`the Ziv-Lempe! ("LZ") family of data compression tech-
`niques are all examples of the dictionary type of data coding
`method. In the LZ family of data compression methods, a
`typical LZ-type compression method processes an input data
`stream by checking first if each current data sqing encoun-
`tered in the input data stream matches a data string already
`stored in the output data buffer. If no match of the current
`data string to previously stored data strings is detected, the
`current data string is stored into the output buffer. If,
`however, a match is detected between the current data string
`and a data string already stored in a memory location of the
`output data buffer, a pointer indicating that memory location
`is stored into the output buffer instead of the data string.
`Shown in FIGS. 1 and 2 are two examples of LZ data
`compression methods. The LZ-1 compression method
`shown in FIG. 1 processes an uncompressed input data
`stream 10 to generate a compressed data output stream 20 by
`comparing an uncompressed portion 13 of input data stream
`10 to data in a history buffer 11 of already processed input
`data. If a matching data string 12 is located in history buffer
`11 for current data string 14, data string 14 is encoded in
`compressed data stream 20 as a pointer (p0 , 10 ) 24, corre(cid:173)
`sponding to an offset p0 15 and a data length 10 16. The
`, 10
`) 24 thus replaces longer
`shorter length data of pointer (p0
`data string 14 in output compressed data stream 20.
`History buffer 11 is considered to comprise no data at the
`time prior to data compression of input data stream 10. As
`the compression process progresses, history buffer 11
`60 expands within a given system memory reserve according to
`how much of input data stream 10 has been processed until
`history buffer 11 reaches the maximum system memory
`allocation available for data compression. Thus, in the case
`where no matching string is found, as in the case for data
`65 string 12 during the initial data compression stage of input
`data stream 10, unmatched string 12 is stored into output
`) 22
`data stream 20 in the form of a literal length header (LL0
`
`35
`
`30
`
`Having an efficient data compression system is increas(cid:173)
`ingly significant as electronics manufacturers compete with
`each other for compactness and improved performance in
`their electronic products. In particular, an increasing market
`demand for a variety of portable electronic products has
`resulted in requiring a substantial reduction to the system
`real estate available for electronic data storage and data
`manipulation in the designs of these products. Thus, with
`less electronic memory available, having an efficient data
`compression method is even more critical in the designs of
`portable electronics, if these devices are to achieve the
`comparable operation of a larger electronic system.
`A variety of data compression techniques are known. The
`performances of each of these various data compression
`techniques are measured by the compression ratio, which is
`the length of an uncompressed input data stream to the
`length of its corresponding compressed data stream follow-
`ing data compression. The compression ratio for each data
`compression technique, however, also varies depending on
`the data type of the input data stream. Some data compres(cid:173)
`sion techniques have a higher compression ratio for ASCII
`type input data than for binary data type, while other data 40
`compression techniques result in a lower compression ratio
`for ASCII data type and a higher ratio for binary data type.
`Thus, for each data type, one or more data compression
`techniques can be identified which will provide an optimal
`data compression ratio according to that data type, while 45
`other data compression techniques producing a lower com(cid:173)
`pression ratio for that particular data type should be avoided.
`A variety of data types are known and used by the industry
`to encode characters, punctuation marks, and other symbols
`found in texts and communication protocols. Known data 50
`types include ASCII standard format, binary standard for(cid:173)
`mat, and unicode standard format. Although ASCII standard
`comprises a set of 8-bit binary numbers, only 7 of these bits
`are typically used to represent an actual data symbol, while
`binary standard format encodes one data symbol in 8 bits. 55
`Unicode represents each data symbol with two bytes, or a set
`of 16-bit binary numbers. The first byte, or the first 8-bit
`prefix, indicates a data characteristic information of the
`16-bit data symbol. For example, the first byte might indi(cid:173)
`cate that the 16-bit data symbol is a Kanji character.
`However, despite the variety of data types that are com(cid:173)
`monly used in the industry, prior art data manipulation
`processes do not include automatic detection of the data type
`of an input data stream. Most prior art data manipulation
`processes rely on the user or another source external to the
`data manipulation process itself to supply such data type
`information. For example, in a file
`transfer program
`
`NetApp; Rackspace Exhibit 1013 Page 10
`
`
`
`5,467,087
`
`4
`FIG. 2 illustrates an example of the prior art LZ-2 data
`compression;
`FIG. 3 illustrates an example of a prior art data compres(cid:173)
`sion system;
`FIG. 4 hows a block diagram of one embodiment of a
`lossless data compression and decompression process taught
`in accordance with the principles of this invention;
`FIG. 5 illustrates a detailed block diagram of one embodi(cid:173)
`ment of the data compression process of FIG. 4;
`FIG. 6 illustrates a detailed block diagram of one embodi(cid:173)
`ment of the input data compression process of FIG. 5;
`FIG. 7 illustrates a detailed block diagram of one embodi(cid:173)
`ment of the data decompression process shown of FIG. 4;
`FIG. 8 shows a block diagram of one embodiment of a
`lossless data compression system constructed in accordance
`with the principles of this invention; and
`FIG. 9 illustrates a block diagram of one embodiment of
`a data decompression system constructed in accordance with
`the principles of this invention.
`
`DETAILED DESCRIPTION
`
`20
`
`3
`followed by data string 12 duplicated from original data
`stream 10. Literal length header 22 encodes the number of
`characters, n, in unmatched string 12 that follows literal
`length header 22. This encoded information is recovered
`during data decompression to notify the decompression 5
`process of the number of data characters following literal
`length header 22, corresponding to the original input data
`that need not be expanded.
`The LZ-2 data compression method of FIG. 2 searches for
`matching current data string 14 in a dictionary 30 of indices. 10
`Dictionary 30 comprises a limited buffer length and data
`strings from input data stream 10. If a matching data string
`12 is located in dictionary 30 for current data string 14,
`current data string 14 is then encoded in the output data
`stream with index 32 corresponding to the location of data
`string 12 in dictionary 30. Because the LZ-1 method of FIG. 15
`1 searches for a matching data string character by character
`through the history buffer, the time required to compress
`input data stream 10 is substantially greater when using the
`LZ-1 method of FIG. 1 than with the LZ-2 method of FIG.
`2. However, the LZ-1 method provides a greater data
`compression ratio than the LZ-2 method.
`Data decompression is the conversion of a stream of
`compressed data back to its original expanded form. Decom(cid:173)
`pression is typically accomplished with a lookup table, if the 25
`data was compressed using a statistical or a Huffman type
`coding scheme. If the data was compressed using a dictio(cid:173)
`nary type data compression method, such as the LZ-1
`method (as explained above with reference to FIG. 1),
`original data stream 10 is reconstructed by replacing each 30
`pointer (p, 1) encountered in compressed data stream 20 with
`the data string in the history buffer located at offset p. If the
`data was compressed with an LZ-2 data compression
`scheme (as explained above with reference to FIG. 2), the
`dictionary generated during data compression is used to 35
`retrieve the indexed data strings.
`FIG. 3 illustrates a typical prior art data compression
`system. Data compression system 40 receives an input
`uncompressed data stream 10 and processes data stream 10
`through a first data compression phase 42 using a first 40
`predefined data compression technique. Alternatively, prior
`art data compression system 40 may also provide a second
`data compression phase 44 using a second data compression
`technique also predefined by the design of data compression
`system 40. Prior art data compression systems thus use the 45
`same data compression techniques incorporated by the data
`compression system design regardless of the data type
`encountered in the input data stream. Because each data
`compression technique typically provides a different com(cid:173)
`pression ratio for different data types, prior art compression 50
`systems are unable to maximize the data compression ratio
`when encountering a variety of input data types in the input
`data stream. There is therefore a need to provide an efficient
`and flexible data compression system that maximizes the
`data compression ratios according to the input data type 55
`detected. Moreover, prior art data compression systems also
`do not maximize the usage of the CPU, such as to provide
`normal rate of data compression during the CPU's idle time,
`but increasing the rate of data compression when the CPU is
`preparing to process another task. It is therefore also desir- 60
`able to have a data compression system that provides
`controlling means to increase or decrease the system's rate
`of data compression.
`
`FIG. 4 shows one embodiment of a high speed lossless
`data compression and decompression process 100 of the
`present invention. Data compression process 102 comprises
`two phases: a data pre-compression phase 106 and a com(cid:173)
`pression phase 108. Similarly, data decompression process
`104 also comprises two phases: a data type retrieval phase
`110 and a decompression phase 112. During data compres(cid:173)
`sion process 102, data pre-compression phase 106 first
`receives an uncompressed input data stream 101 and iden(cid:173)
`tifies the data type of the input data stream. Data pre(cid:173)
`compression phase 106 also generates a data type identifi(cid:173)
`cation signal. Compression phase 108 then selects a data
`compression method from a set of data compression meth-
`ods according to the data type identification signal.
`Itis envisioned as within the scope of the principles taught
`in accordance with this invention that the set of data com(cid:173)
`pression methods can include a variety of data compression
`methods, such as the data compression methods from the
`LZ-type family of data compression methods, the Huffman
`code family of data compression methods, or other such data
`compression methods, including the Arithmetic code, or
`combinations of such data compression methods. In the
`preferred embodiment of this invention, the set of compres-
`sion methods comprises a combination of LZ-type/Huff(cid:173)
`man-type compression methods. For example, if the input
`data type is identified as ASCII, the data type identification
`signal from pre-compression phase 106 indicates to com(cid:173)
`pression process 108 to select an LZ-1 and a Huffman type
`HA combination of compression methods that is designed to
`provide an optimal compression ratio for ASCII type data.
`According to the selected combination of compression
`methods, compression process 108 then compresses the
`input data stream first with the LZ-1 data compression
`method to generate a first set of compressed data. The first
`set of compressed data is then processed with the Huffman
`type compression method, HA, to provide a second set of
`compressed data. Likewise for other data types, one or more
`LZ-type/Huffman type combinations of compression meth-
`ods, which provide an optimal data compression ratio for
`one or more particular data types, can be included in the set
`of compression methods used during compression process
`65 108.
`FIG. 5 illustrates a detailed block diagram of the preferred
`embodiment of data compression process 102 of FIG. 4.
`
`BRIEF DESCRIPTION OF DRAWINGS
`
`FIG. 1 illustrates an example of the prior art LZ-1 data
`compression;
`
`NetApp; Rackspace Exhibit 1013 Page 11
`
`
`
`5,467,087
`
`10
`
`15
`
`5
`During data pre-compression 106, the data type of input data
`stream 101 is first identified with data type identification
`process 114. In the preferred embodiment, data type iden(cid:173)
`tification process 114 detects the input data type as either
`ASCII, binary, or unicode by analyzing a predefined number
`of bytes of input data stream 101.
`Typically, in the ASCII format, a data symbol is encoded
`in only 7 bits out of a set of 8 bits, while the binary format
`uses all 8 bits to represent a data symbol. Consequently, a
`byte of ASCII data corresponds to a decimal equivalent
`value in the range of 0-127, while a byte of data in binary
`format represents a decimal equivalent value in the range of
`0-255. Thus, data type identification process 114 detects
`whether each byte of input data stream 101 corresponds to
`a decimal equivalent value of greater than 127. If the current
`data byte corresponds to a decimal equivalent value greater
`than 127, than the data type of input data stream 101 is
`identified as binary. If the current data byte corresponds to
`a decimal equivalent value of less than 127, data type
`identification process 114 continues to check the next byte of
`input data until the end of the input data stream. A consistent
`pattern of data bytes, each comprising a decimal equivalent
`of less than 127, indicates that the input data type is ASCII.
`In the preferred embodiment, data type identification
`process 114 also detects for unicode format data type by 25
`comparing the first bytes of a predefined number of pairs of
`bytes in input data stream 101. A typical data symbol in
`unicode is represented by a pair of bytes, with the first byte
`always Indicating the data characteristic (e.g., Kanji char(cid:173)
`acter type) of the data encoded in the pair of bytes. Thus, if 30
`the first bytes of the predefined number of pairs of data bytes
`matches, then the data type of input data stream 101 is
`identified as unicode.
`Once the data type is identified, data pre-compression 106
`also preferably encodes is data type information in any
`known standard used in the industry as means for denoting
`the data type of a data stream. The typical standard used for
`denoting a data type of a data stream is to encode the data
`type information in a header located at the beginning of an
`output data buffer. The data type information is then
`retrieved by decoding the header during data decompression
`to identify the data type of the compressed data stream being
`decompressed.
`In the preferred embodiment of the present invention,
`during data pre-compression phase 106, a Pmax and lmax
`value are also selected which provide an optimal LZ-1 data
`compression ratio according to the identified data type.
`Table 1 illustrates a range of Pmax and lmax values for ASCII
`and binary type data that may be used with the LZ-1 data 50
`compression method. Selecting a lower Pmax typically
`increases the rate of data compression, while typically
`decreasing the compression ratio. Similarly, selecting a
`lower lmax also typically increases the rate of data compres(cid:173)
`sion, since a shorter character length 1 requires less search 55
`time. Selecting a lower lmax also typically results in a lower
`compression ratio. Thus, varying the Pmax and the lmax
`parameters typically produces a different compression time
`and a different compression ratio.
`
`6
`type signal generation process 116 generates a type identi(cid:173)
`fication signal to be provided to compression phase 108.
`Compression method selection process 118 selects an LZ(cid:173)
`type/Huffman-type combination compression method in
`5 accordance with the data type identification signal, and input
`data compression process 120 compresses the input data
`stream according to the selected data compression method.
`In the preferred embodiment of this invention, selection
`process 118 selects, in accordance with the data type iden-
`tification signal, a compression method having a first data
`compression phase comprising an LZ-1 compression and a
`second data compression phase comprising a Huffman-type
`compression or Arithmetic code compression.
`FIG. 6 illustrates a detailed block diagram of one embodi(cid:173)
`ment of compress input data process 120 of FIG. 5. In the
`preferred embodiment of compress input data process 120,
`a system memory allocation process 140 is provided to
`allow the system or the user control of the amount of system
`memory to be allocated for data compression. Memory
`allocation process 140 estimates the memory requirement
`20 necessary to compress input data stream 101 and then
`allocates that estimated amount of system memory for data
`compression process 120. In the preferred embodiment,
`memory allocation process 114 estimates the memory
`requirement in accordance to the identified input data type,
`a selected compression ratio, a selected speed of data
`compression, a selected Pmax and lmax value, or a selected
`combination of these features. As data compression process
`120 progresses and more system memory is needed to
`complete data compression of the input data stream,
`memory allocation process 140 then allocates additional
`increments of system memory to data compression process
`120. Alternatively, it is also envisioned as within the scope
`of the principles taught by this invention to have memory
`allocation process 140 provide an initial memory allocation
`35 of a predefined range of system memory for data compres(cid:173)
`sion process 120 without first estimating a memory alloca(cid:173)
`tion requirement. Memory allocation process 140 then sub(cid:173)
`sequently provides additional increments of system memory
`during the compression process as is needed.
`Once the initial system memory is allocated, first data
`compression process 122 commences data compression of
`the input data stream using the LZ-type data compression
`method to generate a first set of compressed data. During a
`45 second compression process 124, the first set of compressed
`data is compressed using the Huffman-type code compres(cid:173)
`sion method.
`In an alternative embodiment of input data compression
`process 120 also shown in FIG. 6, a compression rate control
`signal 103 is provided to data compression process 120.
`Data compression rate adjustment process 130 adjusts the
`values of lmav Pmax• or both lmax and Pmax to increase or
`decrease the compression process speed in response to data
`compression rate control signal 103. In an alternative
`embodiment of data compression rate adjustment process
`130, adjustment process 130 indicates to LZ-type compres(cid:173)
`sion process 122 whether to use LZ-1 or LZ-2 compression
`in accordance with data compression rate control signal 103
`to adjust the compression time for compressing data. Thus,
`data compression rate adjustment process 130 provides data
`compression process 100 with the flexibility to adjust the
`compression speed during data compression. This flexibility
`provides compression system 100 means to maximize the
`CPU's idle time to do data compression and means to
`increase the data compression speed when the CPU is in
`preparation to begin another process.
`FIG. 7 illustrates an example of a detailed block diagram
`
`40
`
`TABLE 1
`
`Data Type
`
`Pmax Range
`
`!max Range
`
`ASCII
`Binary
`
`2K-8K bytes
`16K-32K bytes
`
`16-2048 bytes
`16-256 bytes
`
`As shown in FIG. 5, once Pmax and lmax are selected, data
`
`60
`
`65
`
`NetApp; Rackspace Exhibit 1013 Page 12
`
`
`
`5,467,087
`
`15
`
`40
`
`7
`of the preferred embodiment of data decompression process
`112 of FIG. 4. Once the data type information of compressed
`data stream 107 is retrieved by decoding the header of
`compressed data stream 107, lookup table selection process
`132 selects a corresponding Huffman code lookup table that 5
`is associated with that data type. A first data decompression
`process 134 then processes the compressed data using the
`selected lookup table to generate a first set of decompressed
`data. A second decompression process 136 then processes
`the first set of decompressed data using the selected LZ type
`decompression codebook to provide as output an expanded
`original data stream. It is also envisioned as within the scope
`of the principles taught by this invention that other such data
`decompression algorithms may be substituted during data
`decompression process 112 to decompress compressed data
`stream 107, if another compression algorithm was selected
`during data compression process 102 in response to the
`particular data type of the original data stream.
`FIG. 8 illustrates the preferred embodiment of a data
`compression system 200 constructed in accordance with the
`principles of this invention. Data pre-compression system 20
`202 receives an input data stream 101 and identifies its data
`type. Data pre-compression system 202 also generates a data
`type identification signal 105 in response to the identified
`data type of input data stream 101. Data compression system
`204, which also receives input data stream 101, is coupled 25
`to data pre-compression system 202 to receive data type
`identification signal 105. Data compression system 204 thus
`compresses input data stream 101 in accordance with the
`identified input data type. In one embodiment of this inven(cid:173)
`tion, data compression system 204 selects in response to data 30
`type identification signal 105 at least one data compression
`method from a set of data compression method. Data com(cid:173)
`pression system 204 then processes input data stream 101
`according to the selected data compression method to gen(cid:173)
`erate a compressed data output stream 107. In another 35
`embodiment of this invention, data compression system 204
`receives a data compression rate control signal 103 and
`adjusts the selected data compression method in response to
`compression rate control signal 103.
`In the preferred embodiment of data compression system
`200, data pre-compression system 202 comprises data type
`identification and data type signal generation process 106
`and data compression system 204 comprises compression
`data process 108 as explained with reference to FIG. 4.
`FIG. 9 illustrates one embodiment of a data decompres(cid:173)
`sion system 300 constructed in accordance with the prin(cid:173)
`ciples of this invention. Data pre-decompression system 302
`receives a compressed data stream 107 and identifies its data
`type. In response to the identified data type, data pre(cid:173)
`decompression system 302 generates a compressed data type
`identif