throbber
United States Patent
`
`[:9]
`
`Che
`
`[54] LOSSLESS DATA COMPRESSION CIRCUIT
`AND METHOD
`
`[75]
`
`Inventor: Ke-Chiang Chit. Saratoga, Calif.
`
`[73] Assignee: The Board of Trustees of the Leland
`Stanford Junior University, Stanford,
`Calif.
`
`[21] App]. No.1 670,281
`
`[22] Filed:
`
`Mar. 15. 1991
`
`Int. Cl.5 ............................................ .. HOSM 7/36
`[51]
`[52] U.S. Cl. ...................................... .. 382/56; 341/51;
`341/67
`Field of Search .................... .. 382/56; 341/51. 61?,
`341/95. l06; 353/261.]. 261.4, 261.2
`
`{S8}
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4.558.302 ‘.2./I985 Welch ................................. .. 3-1-U95
`4.941.193 T/1990 Barnsley et al
`38256
`5.C0l.47B 311991 Nagy ............... ..
`341.351
`5.003.307
`3/1991 Whiting et al.
`34l;'5l
`5.051.745 9/1991 Kaiz
`341.351
`
`
`
`Primary Examiner-—Leo H. Bondreau
`Assistant Exam:‘ner—Steven P. Klocinski
`Attorney. Agent. or .Fl‘rm——Flehr. Hohbach, Test,
`Albritton & Herbert
`
`[57]
`
`ABSTRACI
`
`ll|||||||||l|I|||||||||l|lll|||||||||||||ll||||llllll||||||l||||l||||ll||||
`usoosi sn43oA
`
`[11] Patent Number:
`
`5,150,430
`
`[45] Date of Patent:
`
`Sep. 22, 1992
`
`data string with a set of comparison data. and produces
`a sequence of codewords representing a sequence of
`successive. non-overlapping substrings of the new data
`string. A shift register stores and shifts the comparison
`data until all of the characters in the comparison data
`have been compared with the new data string. A com-
`posite reproduction length circuit finds the maximum
`length string within the set of comparison characters
`matching substrings of characters in the new data string
`beginning at each position in the new data string. The
`composite reproduction length circuit produces a multi~
`plicity of data pairs. one for each position of the new
`data string. Each data pair comprises a maximum length
`value. corresponding to the maximum length matching
`comparison string found for the new data substring
`starting at the corresponding position. and a pointer
`value denoting where the maximurri length matching
`comparison string is located in the comparison data. A
`codeword generator then generates a sequence of code-
`words representing the new data string, each codeword
`including one of these data pairs and representing a
`substring of the new data string. By using N such data
`compression units in parallel, with each storing an idcn~
`tical new data string. and each unit's shift register stor-
`ing and shifting a different subset of a specified compari-
`son string, processing time for generating codewords is
`reduced by a factor of approximately (N— l)/N.
`
`A lossless data compression circuit compares a new
`
`9 Claims. 6 Drawing Sheets
`
`200
`2o2/"/
`
`NEW DATA
`
`0: MATCI-i0...55
`
`Oracle 1008
`
`Oracle 1008
`
`
`
`BUFFER CODEWORD
`
`
`
`

`
`U.S. Patent
`
`Sep. 22, 1992
`
`Sheet 1 of 6
`
`5,150,430
`
`EIlIflEllllIIBElIIl!— '“'°”T5T“"“G
`
`-:ptr,length,tai|_character>=codeword=<125.5."r">
`
`If
`I
`I
`
`-lI|flBIfl!flIllfl;';1BIlII—
`
`
`
`90
`
`HISTORY BUFFER
`
`100
`
`/V/120
`
`CLK
`
`MAXIMUM REPRODUCTION _
`LENGTH REGISTER
`
`POINTER REGISTER
`
`E0
`
`%
`3|;
`w|.L
`asO
`O
`
`

`
`U.S. Patent
`
`Sep. 22, 1992
`
`Sheet 2 of 5
`
`5,150,430
`
`1 54
`
`152
`
`CODEWORD REGISTER
`
`coogwoan BUFFER
`
`CLK2
`
`L
`
`‘I60
`
`T
`
`
`
`1 64
`
`
`
`ADDRESS GENERATOR
`
`
`
`
`
`
`170
`
`MESSAGE
`
`
`BUFFER
`
`neconen
`SYMBOLS
`
`HIST
`
`I N
`
`OFIYBUFFER
`
`MUX
`
`T
`
`
`
`
`
`CODEWOFID
`
`BUFFER
`
`

`
`U.S. Patent
`
`Sep. 22, 1992
`
`Sheet 3 of 5
`
`5,150,430 I
`
`LD_CW1 .
`LD_Cw2.
`cw Fm
`230
`
`200
`
`\/\ SHlFT_NEW_DATA
`
` COUNTER CODEWOHD
`GENEFIATO‘
`
`
`
`ll
`
`
`LEN
`
`SHII-'|'_NEW__DATA.
`SHIP-'l'_OLD_DATA
`
`u
`
`P E
`
`FHOFNTY
`NODDER
`
`
`I
`
`EXTERNAL
`cooewono
`INTERFACE
`
`3°34’
`NEW BAND}
`
`
`
`
`3°‘
`
`LD_CW2
`
`CW_HDY-—§
`
`FIGURE 10
`
`

`
`U.S. Patent
`
`Sep. 22, 1992
`
`Sheet 4 of 6
`
`5,150,430
`
`NEW DATA i-1
`
`NEW DATA i
`
`206
`
`
`
`
`
`ow DATA
`SHIFT
`REGISTER
`
`cmcum
`
`SH! FT_N EW_DATA
`
`OLD DATA i-1
`
`SHI FT_OL D_DATA
`CLK
`
`
`
`224
`
`W
`
`sun-r_uew
`__DA'l'A
`
`
`
` PTR(i—1)
`
`COUNTER
`(11 :0] FIGURE 9
`
`
`

`
`U.S. Patent
`
`Sep. 22, 1992
`
`Sheet 5 of 5
`
`5,150,430
`
`m?
`
`MATCH55
`
`MATCH54
`
`MATCH53
`
`MATCH52
`
`MATCH51
`
`MATCH50
`
`VCC
`
`
`
`In.I..III.IIInIn‘mvID._.¢2
`
`eififitl
`
`InI‘._IIIIIIImv_._Oh<E
`
`..f.ur.u.vm“
`
`vIO._.<2
`
`....rWm..mm§
`
`ufwfiwwfi
`
`235i.'..I..‘.lII.....I.I...IIo_IO.—.<s_
`
`
`
`....mm£m...w....v
`
`FIGURE 8B
`
`TALLYL]
`
`..uMWLAT
`
`FIGURE 8A
`
`
`

`
`U.S. Patent
`
`Sep. 22, 1992
`
`Sheet 6 of6
`
`5,150,430
`
`
`
`320-N
`
`
`
`II
`
`OLD DATA
`
`PT Fl
`
`
`
`
`
`
`
`NEW DATA
`
`OLD DATA
`
`NEW DATA
`
`
` CODEWORDT
`
`
`
`I -
`
`I —
`
`-
`
`
`
`MRL I
`
`
`
`BEST
`BEST
`conewono
`cooewono
`
`SELECTOR
`SELECTOR
`3557
`
`(wnm LARGEST
`(wnm LARGEST
`°°°E“'°F‘D
`
`MRL)
`MRL)
`
`
`
`FIGURE 11
`
`

`
`1
`
`5,150,430
`
`LOSSLESS DATA COMPRESSION CIRCUIT AND
`METHOD
`
`The present invention relates generally to data com-
`pression systems and circuits and particularly to a cir-
`cuit and method for performing a number of data com-
`pression operations in parallel so as to substantially
`reduce the computation time required for performing
`lossless data compression.
`BACKGROUND OF THE INVENTION
`
`The term “lossless data compression“ means that all
`of the information in an input string or message is re-
`tained when that information is compressed so as to be
`represented by a smaller amount of data. Thus, when
`the compressed data is decoded, the reconstructed mes-
`sage will be identical to the original input message rep-
`resented by the compressed data.
`The present invention is. in essence, a highly efficient
`system and method for implementing a well known
`lossless data compression technique, sometimes known
`as the "Lempel and Ziv noisless universal data compres-
`sion algorithm". This data compression algorithm is
`also sometimes called the “history buffer“ algorithm.
`THE LEMPEI. AND ZIV LOSSLBSS DATA
`COMPRESSION METHOD.
`
`Referring to FIG. 1. this data compression algorithm
`works as follows. The data compression unit (or pro-
`gram) stores the last X (e.g., I024) characters (some-
`times called symbols) that have already been encoded in
`a "history buffer“ 90, typically implemented as a circu-
`lar buffer. When new input data is encoded. the data
`compression unit looks for the longest string in the
`history buffer 9|] that matches the new input data, be-
`ginning with the first character of that new input data.
`For instance, if the new input data is the string
`
`"the front axle of the vehicle snapped when .
`
`.
`
`. "
`
`the data compression unit would look for the longest
`matching string beginning with the letters "th .
`.
`. ". If
`the best match found in the history buffer was, say. “the
`f" (e.g.. as pan‘. of the phrase “the lirsl: time .
`.
`. "), then
`the "reproduction length" of the longest matching
`string would be five characters; “t“, “h“, “e“. "", and
`"f". The form of the codeword used to represent the
`compressed data is as follows:
`
`(pointer. reproduction length. tail chal'acIe1').
`
`Thus the codeword comprises (1) a pointer to a position
`in the history buffer where a matching string can be
`found, (2) the length of the matching string. and (3) a
`tail character which is the next character in the input
`string. One reason for including the tail character is to
`handle cases in which a character is not found in the
`history buffer. In such cases, the pointer and length
`fields of the codeword are set equal to "0.0“ and the tail
`character field is equal to the first character of the input
`string.
`Each time that a portion of the input string is con-
`verted into a codeword, the input string is shifted so
`that the beginning of the input string is the first charac-
`ter that has yet to be convened into codeword form.
`Furthermore,
`the contents oi’ the history buffer are
`periodically updated by adding the portions ofthe input
`string that have been processed to the end of the history
`
`10
`
`15
`
`20
`
`25
`
`3|]
`
`35
`
`40
`
`45
`
`SD
`
`55
`
`60
`
`65
`
`2
`buffer. Since the length of the history buffer is typically
`constant. adding new data at the end of the buffer usu-
`ally means that data at the beginning of the buffer is
`erased or overwritten. unless the history buffer is not
`yet full; Depending on the particular implementation of
`the data compression algorithm,
`the contents of the
`history buffer may be updated each time that a new
`codeword is generated. or it may be updated each time
`that a certain number of input characters have been
`processed. For instance.
`the contents of the history
`buffer may be updated every time that another thirty-
`two input string characters have been processed by the
`data compression algorithm.
`The net result of the data compression algorithm is
`that an input string is convened into a sequence of
`codewords, as described above. To see how this pro-
`duces data compression, consider the following exam-
`ple. Assume that the history buffer contains H124 char-
`acters, that the maximum allowed reproduction length
`will be seven, and that it takes eight bits to represent an
`uncompressed character. The length of each codeword
`will be:
`
`it] hits
`3 bits
`3 bits
`2] bits
`
`Pointer to history buffer position
`Reproduction length value
`Tail character
`length of each codeword
`
`If. on average. each codeword has a reproduction
`length of four characters. each codeword will represent
`a total of five characters, including the tail character.
`Thus, on average, it will take 21 bits to represent five
`characters instead of 40 bits. representing a compression
`ratio of about two to one (2:1).
`DECODING COMPRESSED DATA
`
`Data decoding. which is the process of converting
`the codewords back into regular character strings,
`is
`very easy. The data decoding unit or program maintains
`a history buffer that is identical to the one maintained by
`the data compression unit, updating the contents of the
`history buffer with newly decoded characters. Each
`codeword is decoded simply by using the pointer in the
`codeword to locate a position in its history buffer. Then
`the specified number of characters in the history buffer
`are copied, following by the specified tail character. to
`regenerate the string represented by the codeword.
`Data decoding is very fast and uncomplicated.
`Note that we have not specified the contents of the
`history buffer when one first begins to compress or
`decompress data. This is a peripheral issue not relevant
`to the present invention, but is addressed here to en-
`hance the reader's understanding of the compression
`method. There are at least two standard methods of
`dealing with what to put in the history buffer at the
`beginning of the process: (I) initially clear the history
`buffer (e.g., fill it with null characters), or (2) always
`initially fill the history buffer in both the data compres-
`sion unit and in the data decoding unit with a predefined
`set of data that is known to produce reasonably good
`data compression with a wide variety of input data. The
`first solution results in a lower data compression ratio
`when small sets of data are compressed. while the sec-
`ond requires a certain amount of coordination between
`the data compression and data decoding units. Either
`
`

`
`5,150,430
`
`3
`solution. or any other such solution. can be used in
`conjunction with the present invention.
`As will be understood by those skilled in the art. there
`is a set of tradeoffs between the length of history buffer,
`the maximum allowed reproduction length, the com-
`pression ratio and the amount of processing time re-
`quired to perform the data compression. In particular, if
`the history buffer is made longer (say to 4096 charac-
`ters). the chances of finding good matching strings in-
`crease, and the average reproduction length will in-
`crease. However, this will also increase the length of
`each codeword, because more bits will be required to
`represent positions in the history buffer. and it will take
`more processing time to perform the compression be-
`cause it will take longer to find the best matching string
`in a larger history buffer.
`PROCESSING TIME FOR DATA COMPRESSION
`
`The present invention is primarily concerned with
`the problem that compressing data using the Lempel
`and Ziv is very time consuming. As will be explained
`next using some rlurnerical examples. the Lempel and
`Ziv data compression method has been used in a mini-
`ber of contexts in which the speed at which data com-
`pression is accomplished is not critical. such as transmit-
`ting data over telephone lines using standard medium
`speed modems. The Lernpel and Ziv data compression
`method has generally not been used in high speed appli-
`cations. such as data storage on hard disks. because the
`amount of time required for data compression prior to
`storage on disk has been assumed to be excessive.
`Consider the following example of the processing
`time required to compress data for transmission over a
`24-01] baud {i.e.. 2400 bits per second) transmission line.
`which is typical of modem transmission used by desktop
`computers. What we are computing in this example is
`the number of instructions per second that would have
`to be executed by a software data compression program
`in order to keep up with a 2400 baud modem. We will
`assume a data compression ratio of two to one, meaning
`that 4800 bits of data, or about 600 characters must be
`encoded for transmission each second. Also assume a
`history buffer of 1024 characters. and that the average
`codeword represents five characters. Since the average
`codeword represents five characters. the compression
`unit must generate about 120 codewords per second. To
`generate a codeword. the compression unit must try to
`match an input string with strings of characters in the
`1024 character history buffer. A ballpark figure for
`performing this would be about 10.801) instructions:
`about four instructions for performing each character to
`character comparison, with an average of about 1.5 to
`2.0 such comparisons being required for each character
`in the history buffer, plus a small number of instructions
`required to keep track of the best match and to con-
`struct the codeword.
`
`Thus the number of executed instructions required to
`generate the 120 codewords needed on average each
`second would be about: 120): 10,000 or about 1.2 mil-
`lion instructions per second, requiring a computer capa-
`ble of at least 1.2 MIPS. Using an average of three or
`four computer clock cycles per instruction, 1.2 MIPS is
`equivalent to a CPU cycle rate of about -1- or 5 mega-
`hertz (i.e., 4 or 5 million computer clock cycles per
`second}. This is well within the capabilities of many
`desktop computers. with the fastest desktop computers
`in 1990 having CPU cycle rates of approximately 40
`megahertz. In comparison, data decoding requires only
`
`4
`about a hundred instructions per codeword, and thus is
`extremely fast.
`Next, we consider the processing requirements for
`using the Lernpel and Ziv data compression method for
`storing and retrieving data to and from a hard disk or a
`comparable medium in a computer system. The idea
`here is that if data compression can be used with such
`hard disks, then one could approximately double the
`amount of data stored on any hard disk.
`The hard disks used in desktop computers, as of the
`end l990, can store up to about I million characters per
`second. High perforrnance hard disks used with more
`expensive computer systems can store up to about 3.5
`million characters per second. but we will use a data
`storage rate of 1 million characters per second for the
`purposes of this example. Using a data compression
`scheme with a two to one compression ratio, the rate of
`data storage would be about 2 million characters per
`second. or about 400,000 codewords per second (using
`an average of five characters per codeword}. In order
`to avoid slowing down the rate of data storage to disk,
`data compression would need to be accomplished about
`3300 times faster than in the above 2400 baud modem
`example. Thus. performing “searnless" data compres-
`sion for hard disk storage without impacting on a com-
`puter's data storage perfonnance is much, much more
`challenging. Furthermore. it is clear that such data
`compression cannot be accomplished witlt software,
`since it would require that the computer's processor
`have a CPU cycle rate on the order of 10 gigaherrz.
`The present invention provides a data compression
`circuit using parallel processing techniques which is
`sufficient fast that it can keep up with the data storage
`rates of current hard disks. That is, using 2140 megahertz
`clock rate and the current invention, one can compress
`two million bytes or characters per second into the
`codewords associated with the Lempel and Ziv lossless
`data compression method.
`_
`It is noted that there is always a trade-off between
`implementation feasibility. or processing complexity, of
`a data compression system and compression ratio. The
`present invention provides data compression apparatus
`that is feasible for high speed. using a simple. hardward
`implementation.
`SUMMARY OF THE INVENTION
`
`In summary, the present invention is a lossless data
`compression circuit which produces a sequence of
`codewords representing a sequence of successive, non-
`overlapping substrings of a specified set of data. A first
`shift register stores a new input string. while a second
`shift register stores and shifts a prefix string until all of
`the characters in the prefix string data have been com-
`pared with the new data string.
`A composite reproduction length circuit finds the
`maximum lgth strings within the prefix string which
`match substrings of characters in the new data string
`beginning at each position in the new data string. The
`composite reproduction length circuit produces amulti-
`plieity of data pairs. one for each position of the new
`data string. Each data paircomprises a msrtimurn length
`value. corresponding to the maximum length matching
`prefix string found for the new data substring starting at
`the corresponding position. and a pointer value denot-
`ing where the maximum length matching string is lo-
`cated in the prefix string. A codeword generator then
`generates a sequence of codewords representing the
`new data string, each codeword including one of these
`
`at
`
`IO
`
`15
`
`20
`
`25
`
`30
`
`35
`
`4-5
`
`55
`
`65
`
`

`
`5
`data pairs and representing a substring of the new data
`string.
`In the preferred embodiment, the composite repro-
`duction length circuit has the following components. A
`reproduction length register stores a separate length
`value corresponding to each position of the new data
`string, and a pointer register stores a separate pointer
`value corresponding to each position of the new data
`string. A counter generates a pointer value denoting
`which subset of the prefix characters are currently
`stored in the shift register. A modified tallying circuit
`generates a tally value for each position in the new data
`string, each tally value representing how many consec-
`utive prefix characters in the shift register, up to a pre-
`defined maximum value. match characters in the new
`data string beginning at a corresponding position in the
`new data string. Finally. a decoder compares each tally
`value with a corresponding one of the length values
`stored in the reproduction length register. and when the
`telly value exceeds the corresponding stored length
`value, it replaces the stored corresponding length value
`with the tally value and replaces the corresponding
`pointer value stored in the pointer register with the
`pointer value generated by the counter.
`By using N such data compression units in parallel,
`with each storing an identical new data string, and each
`unit’s shift register storing and shifting a different subset
`of a. specified prefix string, processing time for generat-
`ing codewords is reduced by a factor of (N — I)/N (e.g.,
`using three parallel data compression units would re
`duce processing time by a factor of approximately §).
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Additional objects and features of the invention will
`be more readily apparent from the following detailed
`description and appended claims when taken in con-
`junction with the drawings. in which:
`FIG, 1 conceptually depicts how a codeword is gen-
`erated from an input string and a prefix string.
`FIG. 2 is a conceptual diagram of a computer system
`incorporating the lossless data compression and data
`decoding circuits of the present invention in the data
`path between a hard disk storage system and the com-
`puter system's internal‘ bus.
`FIG. 3 is a block diagram of a simple hardware cir-
`cuit for lossless data compression.
`FIG. 4 is a block diagram of a circuit for decoding
`compressed data generated using the circuit of FIG. 3.
`FIG. 5 is a conceptual diagram of a high speed loss-
`less data compression circuit which speeds data com-
`pression using one type of parallel data processing.
`FIG. 6 is a block diagram of a preferred embodiment
`of a high speed lossless data compression circuit using
`the firs! type of parallel data processing.
`FIG. 7 is a block diagram of a data comparison cir-
`cuit.
`FIGS. 8A and 83 are circuit schematics for a modi-
`fied tally circuit.
`FIG. 9 is a block diagram of a decoder for selecting
`the longest string in a set of comparison data which
`matches a specified new data string.
`FIG. 10 is a block diagram of codeword generator
`circuit.
`FIG. 11 is a block diagram of a high speed lossless
`data compression circuit which speeds data compres-
`sion nsing a second type of parallel data processing.
`
`5,150.4-30
`
`6
`
`‘ 5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`45
`
`S0
`
`55
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENT
`
`Referring to FIG. 2, we show a typical computer
`system 100 in which the present invention can be used.
`The computer system 10!} includes a CPU 1il2, memory
`module 104, an internal bus Ill}, and a hard disk storage
`system 112 which stores data on a hard disk 114. In the
`preferred embodiment, data being written to the hard
`disk 114 are processed by a data encoding circuit 120
`and data being retrieved from the hard disk are pro-
`cessed by a data decoding circuit 150. In the preferred
`embodiment, both the encoder 129 and decoder 150 are
`implemented on a single, integrated circuit 116. The
`combined encoder/decoder circuit 116 is used in the
`data path between a computer's internal bus 110 and its
`hard disk storage system 112. causing data stored on
`hard disk 114 to be automatically encoded and com-
`pressed as it is being written to the hard disk 114, and to
`be automatically decoded as it is retrieved. If the en-
`coder circuit 120 is sufficiently fast at performing the
`necessary encoding operations, the use of the data com-
`pression and decoding circuits will be “seamless“. in
`that their presence would not aifect the performance of
`the computer system 100, except that the amount of
`data that could be stored on the hard disk 114 would be
`approximately doubled (assuming a compression ratio
`of 2:1).
`Referring to FIG. 3. we will first describe a very
`simple hardware implementation of a data compression
`circuit 120 for performing the Letnpel and Ziv lossless
`data compression algorithm. The first data encoding
`circuit 120 is approximately thirty times faster than a
`comparable software data compression program. Later,
`we will show a data compression circuit which is faster
`than this first circuit, on average, by a factor often. This
`means that, on average, the improved circuit reduces
`the number of clock cycles required to produce the
`same codewords by about a factor of ten. As will also be
`described below, the speed of data compression can be
`further increased by an additional factor of four to ten,
`if one is willing to use a commensurate amount of addi-
`tional circuitry.
`The term "character" as used in this document is
`defined to mean any symbol or other form of data
`which occupies a predefined amount of storage space
`(i.e., a predefined number of hits), such as a byte or
`“word".
`In the following description. the string of characters
`that is being compressed is represented by the letter S.
`The string S contains characters at positions beginning
`at position 0 and continuing to position N— 1, where -N
`is the length of the string. A substring ofS which starts
`at position i and ends at position j is denoted as S(i,j).
`For convenience,
`the string currently stored in the
`history buffer is labelled H, and substrings in the history
`buffer are denoted H(i,_i). where i and j are the begin-
`ning and ending positions of the substring in the history
`buffer. The contents of the history bufier are sometimes
`called a proper prefix of the string which is currently
`being processed for data compression.
`Given a string S, the reproduction length of the string
`8 at position i is the largest integer I..,- such that
`
`65
`
`sa.i+L.-— n=Ha=.k+L.--1}
`
`where H represents the contents of the history buffer
`and It is any position within the history buffer. The
`
`

`
`5,150,430
`
`7
`position P,-in the history buffer corresponds to the maxi-
`mum value of the reproduction length L,-at position i.
`Let S=s1s;. .
`. be the string of source symbols that is
`being compressed. The sequential encoding of S parses
`S into successive source words, S=S1S2 .
`.
`.
`. For each
`source word S,-, the encoder looks for the longest repro-
`duction length L,-, and assigns a codeword C,-.
`
`5
`
`IO
`
`20
`
`25
`
`30
`
`35
`
`45
`
`SD
`
`55
`
`65
`
`The encoding procedure can be formally described as
`follows.
`1. Initially. SR2 is reset to be 0. SR1 stores the string to
`be parsed.
`2. The history buffer string is shifted into SR2 sequen-
`tially, at the rate of one symbol per clock cycle. RLG
`122 compares SR1 against SR2 at each clock cycle. A
`reproduction length L is generated. At the end of
`every clock cycle, the contents of Mill. and pointer
`registers 124 and 126 are updated. if necessary.
`3. After N clock cycles. where N is the length of the
`history buffer string. a codeword C; is generated.
`Ct= < P.'.Lr.T:'>-
`4. To update the contents of SR], the symbols occupy-
`ing the first L,-+l positions are shifted out, while
`L,-+1 new source symbols are shifted in. Go back to
`step 1. and repeat the procedure until the end of the
`string to be parsed is reached.
`In the preferred embodiment, during the first CLK2
`cycle during which old symbols are shifted out of SR]
`and new symbols are shifted in, the contents of the
`MRI. and pointer registers 124 and 126 are latched into
`the codeword bulfer 134. and simultaneously these reg-
`isters are cleared. At the same time, the contents of SR2
`are also cleared. Thus. no additional clock cycles are
`required for codeword generation. Codewords are gen-
`erated. and registers are cleared for the new codeword
`generation cycle. simultaneously with the shifting in of
`new symbols into the SR] register. Using a history
`buffer of 1024 symbols, and data registers SR1 and SR2
`oflength Lrnax. it will take 1024+Lmax— l+LC clock
`cycles to generate a codeword which represents LC
`symbols, as well as to prepare for generating the next
`codeword.
`Assuming that codewords. on the average. each rep-
`resent five symbols,
`the data compression circuit of
`FIG. 3 requires about 206 clock cycles per character to
`compress a set of data. As a result, a computer system
`with a 4-D megahertz clock rate could compress data
`using this circuit at a rate of about 200,000 characters or
`symbols per second. This is about a factor often slower
`than the rate of processing required for seamless pro-
`cessing of data being stored to a hard disk at a rate of
`one million bytes per second. which is equivalent to
`about two million symbols per second for it system with
`a two to one compression ratio.
`The data compression circuits described below with
`reference to FIGS. 5 through 11 overcome the failure
`of the circuit of FIG. 3 to encode data with sufficient
`speed. First, however, we describe a compressed data
`decoder circuit. which while simple. is more than suiti-
`ciently fast to keep up with even the fastest hard disk
`systems.
`
`COMPRESSED DATA DECODER
`
`Referring to FIG. 4, the compressed data decoder
`circuit 150 temporarily stores received codewords in a
`buffer 152, which are then loaded into codeword regis-
`ter 154 using clock CLK2, which is generated by finite
`state machine (FSM) 160 each time the decoder circuit
`150 is ready to process the next codeword. The current
`contents of the history buffer or prefix string are stored
`in buffer 162. which is typically a memory circuit used
`as circular buffer. The FSM 160 keeps track of the
`current position of the beginning of the prefix string
`within the message buffer 162, and generates a corre-
`sponding OFFSET value that is added by address gen-
`erator circuit l6Il to the pointer value PTR from the
`
`C',--—- <8. Li. 71>
`
`where T; is the last symbol of 5;. To ensure that all
`codewords have the same number of bits, a maximum
`value Lrnax is imposed on the reproduction length,
`typically 1 or 15.
`The core of the Letnpcl-Ziv encoder shown in FIG.
`3, is implemented with two shift registers, SR1 and SR2,
`a reproduction length generator (RLG) 122, maximum
`reproduction length and pointer latches 114 and 126,
`and a counter 128. In addition, a memory circuit or
`message buffer 130 is used to store the data that is being
`compressed, a finite state machine (FSM) 132 controls
`the sequence of steps performed by the entire circuit
`120. and a codeword buffer 134 is used to temporarily
`hold portions of the codewords generated by the circuit
`120. The FSM 132 generates the two clock signals CLK
`and CLK2 which control the shift registers SR1 and
`SR2, and also generates a signal CW'_RDY whenever a
`new codeword has been generated.
`Both shift registers SR1 and SR2 are of the same
`length. In particular. they both store a number of char-
`acters which is equal to the maximum allowed repro-
`duction length. For specificity. we will assume that
`both SR1 and SR2 store seven characters at any one
`time.
`Shift register SR1 stores the substring to be parsed.
`After SR] is initially loaded with data, the contents of
`SR! are shifted only after the generation of each code-
`word. As SR] is shifted, new data is loaded into the shift
`register from the message buffer 130. In particular, if a
`codeword with a reproduction length of L is generated,
`SR1 is shifted L+ 1 times using clock signal CLK2. The
`last character shifted out of SI-‘.1 is held in the codeword
`buffer 134 as the tail character of the codeword.
`At the beginning of each codeword generation cycle.
`the contents of registers 124 and 126 are cleared. i.e., set
`to zero. In addition. the counter l23_ is also reset to a
`predefined beginning value. such as zero. Then the
`history buffer contents, also called the proper prefix
`substring,
`is shifted through SR2 using clock signal
`CLK. Lmax—l clock cycles are required to preload
`SR2. where Linux is the maximum allowed reproduc-
`tion length. and is also the number of characters posi-
`tions in SR1 and SR2. Then, each time that SR2 shifts,
`the reproduction length generator 122 compares the
`contents of SR] and SR2. and generates a reproduction
`length value L. Comparator 136 compares the current
`reproduction length value I. with the current value
`MR1. stored in the maximum reproduction length regis-
`ter 124. If the current reproduction length value I. is
`greater than MRL,
`the comparator 136 generates a
`LOAD signal, which causes the current reproduction
`length value I. to be stored in register 12!. and also
`causes the current pointer value generated by counter
`128 to be stored in pointer register 125.
`After the proper prefix substring has completely
`shifted through SR1, the codeword can be generated
`from the contents of MRL register 124 and pointer
`register 126. followed by the appropriate tail symbol.
`
`

`
`5,150,430
`
`9
`codeword being decoded. The resulting address is the
`address of the first character of the substring repre-
`sented by the codeword. The OFFSET value is then
`incremented by the FSM 160 L times, where L is the
`reproduction value from the codeword being decoded.
`during each new CLK clock cycle, until the decoding
`of a new codeword begins.
`After the initial CLK2 signal used to load a code-
`word into register 154, the FSM generates L+l clock
`signals CLK. is the reproduction length value obtained
`from the codeword being decoded. At each CLIC sig-
`nal, a new decoded symbol is generated. The new de-
`coded symbol is read from the output of a multiplexer
`166. which outputs either (I) the tail character T from
`the codeword being decoded, or (2) the symbol read
`from the message buffer 162 at the address generated by
`address generator 164. These decoded symbols are then
`stored at the appropriate position in the history buffer
`162. as well as in an output message buffer 170. Depend-
`ing on how the history buffer 162 is designed. the read-
`ing of a decoded symbol from one address and the stor-
`age of that symbol back into the history buffer at a
`different address can be accomplished in either one or
`two clock cycles. By using a history buffer that stores,
`say, 1024 bytes. but limiting the length of prefix string to
`say. 1020 or 1023 bytes, one of ordinary skill in the art
`can de

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket