`
`[19]
`
`[11] Patent Number:
`
`5,973,630
`
`Heath
`
`[45] Date of Patent:
`
`Oct. 26, 1999
`
`US005973630A
`
`[54] DATA COMPRESSION FOR USE WITH A
`COMMUNICATIONS CHANNEL
`
`[57]
`
`ABSTRACT
`
`[75]
`
`Inventor: Robert Jefi' Heath, San Diego, Calif.
`
`[73] Assjgnee; Hughes E]ectmnic5 Corporation, E1
`Segundo, Cafif.
`
`21 A 1. N .2 09 258 782
`]
`pp
`0
`/
`’
`[
`[22]
`Filed;
`Mar, 1, 1999
`
`Related U_s_ Application Data
`
`[62] Division of application No. 08/982,864, Dec. 2, 1997.
`[51]
`Int. cm ...................................................... H03M 7/00
`[52] U.S. Cl.
`................................................. 341/87; 341/51
`[58] Field of Search .............................. .. 341/87, 51, 106,
`341/67, 95, 50
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,473,326
`
`12/1995 Harrington et al.
`
`.................... .. 341/51
`
`Primary Examiner—Brian Young
`Attorney, Agent, or Firm—John T. Whelan; Michael W.
`Sales
`
`A method of compressing data involves receiving a symbol,
`and 3 Subsequent 5YH1b01; determining in 3 C0mPre55i0n
`dictionary whether the symbol has a valid extension pointer;
`using, in the event the symbol does have a valid extension
`pointer, the valid extension pointer to access string extension
`symbols; determining, in the event the symbol does have a
`valid extension pointer, Whether the string extension sym-
`bols equal the at least one subsequent symbol; determining
`in the compression dictionary, in the event the string exten-
`sion symbols do not equal
`the at least one subsequent
`symbol, Whether the symbol has a valid parallel extension;
`repeating,
`in the event
`the symbol has a valid parallel
`extension, the using step; repeating, in the event the string
`33331101;h:yg;§g§;s1g;;;;g;1;;g;§r jL:e:;:n‘;‘;j 3135:1133;
`extensién pointer; inserting’ in the event the Symbol does not
`have a valid extension pointer or in the event the symbol
`does not have a valid parallel extension, a code Word
`indicative of a longest string found into a compressed data
`stream; determining Whether the longest string was a single
`symbol; extending, in the event the longest string was a
`single symbol, the longest string by one symbol; extending,
`in the event the longest string was not a single symbol, the
`longest string by a plurality of symbols; inserting a string
`extension signalling code word if the string is extended by
`multiple symbols and transmitting the compressed data
`stream through the communications channel.
`
`20 Claims, 9 Drawing Sheets
`
`74
`
`/6‘
`
`DECOMPRESSION
`COMPRESSION
`
`
`DICTIONARY
`DICTIONARY
`
`(ADAPTIVE)
`(ADAPTIVE)
`24
`
`
`517
`
`
`
`ORACLE 1 104
`
`
`
`U.S. Patent
`
`5,973,630
`
`9SM.1..............p...................................4.mHmmmmHw_E>_E<9:
`
`
`
`
`
`___HHHH
`
`Nu
`
`
`
`._.:nH._.:OHmommmmmzoomo_ommmmEs_oo_mommmmazoo_E.<oSn_z_ E>_E<9:_H>m<zo:o_o>m<zo_.S_oHHzo_mmmEs_oomozo_mwmEzooH__mH__H1,HI__IHmI_mm_<55HM:_m<53
`
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 2 of 9
`
`5,973,630
`
`COMPRESSION DICTIONARY
`LOCATION
`
`FIRST 256 ENTRIES OF
`COMPRESSION DICTIONARY ONLY
`HAVE AN EXTENSION POINTER
`
`101h
`102h
`
`106h
`107h
`108h
`
`3ffh
`
`INVALID
`INVALID
`
`0 23456
`
`9 A B C D E F10
`
`1213141516
`
`“COMPUTER-°COMPU
`
`IIIII
`
`oI
`
`'~2*+1;g:w
`
`COMPRESSED
`DATA STREAM
`
`c 0 M P u T E R --[100h][10Dh]
`
`-- [man]
`
`F/G.
`
`.3’
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 3 of 9
`
`5,973,630
`
`01234557 9ABCDEF101213141516
`
`COMPRESSED
`DATA STREAM c 0 M P u T E R
`
`, ,
`
`[100h][10Dh]
`
`, ,
`
`[10311]
`
`CM”
`';:R
`O
`1
`1
`1111
`}£%%%%%Z%
`
`--COMPUTER--COMPU
`1
`‘%
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 4 of 9
`
`5,973,630
`
`600
`
`COMPRESSOR
`START
`
`602
`
`9
`
`GET NEXT
`SYMBOL TO
`PROCESS
`
`6'04
`
`USE POINTER TO
`DOES SYMBOL
`ACCESS STRING
`OR STRING HAVE A
`EXTENSION
`VALID EXTENSION
`SYMBOLS
`POINTER?
`
`
`608
`
`
`
`
`
`
`
`
`NO
`
`
`LONGEST EXISTING
`DO ALL STRING
`EXTENSION
`3
`RI
`MA¥G‘I'3OI;"oLSIND,NI(%s
`SYMBOLS
`
`
`
`EQUAL THE
`cone wono IS A
`
`SYMBOLS
`SINGLE SYMBOL
`
`FOLLOWING THE
`oI=I
`IMPLIED BY THE
`
`SYMBOL
`LAST STRING
` EXTENSION ENTRY
`BEING PROCESSED?
`
`
`
`614
`
`
`
`
`
`IS THERE A
`VALID PARALLEL
`EXTENSION
`POINTER?
`
`
`
`
`
`
`
`
`
`PLACE CODE WORD
`REPRESENTING
`LONGEST STRING
`(OR SINGLE
`SYMBOL)
`INTO
`THE COMPRESSED
`OUTPUT
`
`
`
`
`
`
`
`YES
`
`F/G. 6/I
`
`616‘
`
`
`
` WAS THE
`LONGEST STRING
`FOUND A SINGLE
`SYMBOL?
`
`
`
`NO
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 5 of 9
`
`5,973,630
`
`644
`
`
`
`ARE THERE MORE
`THAN THE MAXIMUM
`
`STRING LENGTH OF
`REPEATED SYMBOLS,
`EQUAL TO THE FIRST
`SYMBOL OF THE
`LONGEST STRING,
`FOLLOWING THE
`
`LONGEST STRING
`
`FOUND?
`
`NO
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`YES
`
`
`
`
`
`
`
`
`
`PLACE A RUN
`LENGTH ENCODING
`
`SIGNALLING CODE
`
`WORD FOLLOWED BY
`THE NUMBER OF
`
`REPEATED
`
`SYMBOLS INTO
`
`THE COMPRESSED
`OUTPUT
`
`
`
`6'20
`
`
`IS LONGEST
`STRING EQUAL TO YES
`THE MAXIMUM
`
`STRING LENGTH?
`NO
`
`622
`
`USING THE LONGEST
`STRINGS EXTENSION
`ENTRY. COMPARE
`THE SYMBOLS AT
`THE LOCATION
`POINTER, PLUS THE
`COUNT PLUS ONE,
`TO THE SYMBOLS
`STARTING WITH THE
`NEXT SYMBOL
`TO BE PROCESSED
`
`
`
`6'24
`
`
`
`DO AT LEAST
`2 SYMBOLS
`
`MATCH?
`
`
`YES
`
`
`
`6'26‘
`
`
`
`
`
`CREATE A MULTI
`SYMBOL STRING
`EXTENSION ENTRY
`
`AND EXTEND THE
`
`STRING BY THE
`NUMBER OF
`SYMBOLS
`THAT MATCH
`
`
`
`
`
`
`
`
`
`F/G.
`
`63
`
`
`
`
`
`
`
`
`CREATE A ONE
`SYMBOL STRING
`EXTENSION ENTRY
`AND EXTEND THE
`
`STRING BY ONE
`
`SYMBOL, THE
`
`
`PLACE A MULTI
`NEXT SYMBOL
`SYMBOL
`TO PROCESS
`
`EXTENSION STRING
`
`SIGNALLING CODE
`WORD INTO THE
`COMPRESSED
`OUTPUT
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 6 of 9
`
`5,973,630
`
`630
`
`LINK THE NEWLY CREATED STRING
`EXTENSION TO THE LONGEST STRING
`BY PLACING THE CREATED EXTENSION
`ENTRIES IMPLIED CODE WORD INTO
`THE EXTENSION POINTER OF THE
`LAST STRING EXTENSION OF THE
`
`LONGEST STRING
`
`6'32
`
`PLACE THE POINTER TO THE NEXT
`SYMBOL TO PROCESS INTO THE
`LOCATION POINTER OF THE NEWLY
`
`CREATED STRING EXTENSION ENTRY
`
`PLACE THE NUMBER OF EXTENSION
`SYMBOLS MINUS ONE INTO THE
`COUNT OF THE NEWLY CREATED
`
`STRING EXTENSION ENTRY
`
`634
`
`PLACE AN ILLEGAL VALUE INTO THE
`EXTENSION AND PARALLEL POINTERS
`OF THE NEWLY CREATED STRING
`EXTENSION ENTRY
`
`636
`
`O
`
`638
`
`
`BYPASS LONGEST STRING AND ANY
`OTHER SYMBOLS COMPRESSED
`BY THE CODE WORDS PLACED INTO
`
`
`THE COMPRESSED OUTPUT AND
`POINT TO THE NEXT
`
`
`UNPROCESSED SYMBOL
`
`
`
`640
`
` CARE THERE
`
`
`MORE SYMBOLS TO
`PROCESS?
`
`
`~°
`
`FIG. 50
`
`YES
`
`9
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 7 of 9
`
`5,973,630
`
`700
`
`START
`
`702
`GET NEXT CODE
`wonu TO
`PROCESS
`
`6
`
`704
`
`705-
`
`DOES CODE
`ORR §ElPc§L%3ENT YES
`SYMBOL?
`
`PLACE THE SYMBOL INTO
`THE NEXT UNCOMPRESSED
`LOCATION IN THE OUTPUT
`
`
`
`
`
`USE THIS CODE WORD AS AN
`INDEX INTO THE DICTIONARY
`AND GET THE POINTER,
`IN THE
`PREVIOUSLY UNCOMPRESSED
`OUTPUT. TO THE FIRST
`SYMBOL OF THIS CODE
`WORDS STRING (WHICH IS
`AT THE LOCATION POINTER
`MINUS DEPTH PLUS ONE).
`COPY DEPTH NUMBER OF
`SYMBOLS FROM THAT
`POINTER TO THE NEXT
`UNCOMPRESSED LOCATION
`IN THE OUTPUT
`
`714
`
`0
`
`712
`
`USE THE PREVIOUS CODE WORD
`PROCESSED AS AN INDEX INTO
`THE DICTIONARY AND GET THE
`
`
`POINTER,
`IN THE PREVIOUSLY
`THE CODE
`
`UNCOMPRESSED OUTPUT, TO
`WORD IS NOT IN
`
`
`THE FIRST SYMBOL OF THE
`THE DICTIONARY
`BUT IS IT EQUAL YES PREVIOUS CODE WORDS STRING
`
`
`TO THE NEXT
`(WHICH IS AT LOCATION
`SEQUENTIAL
`POINTER MINUS DEPTH PLUS
`
`
`CODE WORD
`ONE). COPY DEPTH NUMBER OF
`TO BE BUILT?
`SYMBOLS FROM THAT
` POINTER TO THE NEXT
`NO
`UNCOMPRESSED LOCATION
`IN THE OUTPUT
`
`
` IS THE CODE
`WORD IN THE
`DICTIONARY
`(l.E.
`IS IT LESS
`THAN THE NEXT
`SEQUENTIAL
`CODE wono
`TO BE BUILT)?
`
`YES
`
`NO
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 8 of 9
`
`5,973,630
`
`730
`
`YES
`
` IS THE
`
`DEPTH OF THE
`PREVIOUS CODE
`
`
`WORD PROCESSED
`EQUAL TO THE
`
`
`‘ MAXIMUM STRING
`
`LENGTH?
`
`
`
`NO
`
`732
`
`716‘
`
`THE CODE WORD IS NOT IN THE
`DICTIONARY BUT IT IS GREATER
`THAN THE NEXT SEQUENTIAL
`CODE WORD TO BE BUILT.
`USING THE PREVIOUS CODE WORD
`PROCESSED AS AN INDEX INTO
`THE DICTIONARY GET THE
`LOCATION POINTER AND DEPTH
`AND CREATE A "FROM POINTER“
`TO THE LAST SYMBOL. PLUS
`ONE. OF THE PREVIOUS CODE
`WORDS STRING. SUBTRACT THIS
`CODE WORD FROM THE NEXT
`SEQUENTIAL CODE WORD TO BE
`BUILT TO GET THE NUMBER OF
`SYMBOLS BY WHICH THE
`PREVIOUS CODE WORD IS
`EXTENDED..COPY THAT NUMBER OF
`SYMBOLS FROM THE “FROM
`POINTER‘-IN THE PREVIOUSLY
`UNCOMPRESSED OUTPUT TO THE
`NEXT UNCOMPRESSED LOCATION
`IN THE OUTPUT.
`
`734
`
`736
`
`FIG. 7B
`
`COPY THE FIRST
`SYMBOL OF THE
`PREVIOUS CODE WORDS
`STRING TO THE NEXT
`UNCOMPRESSED LOCATION.
`IN THE OUTPUT.
`
`
`
`
`
`
`
`
`
`
`
`
`THIS IS A RUN LENGTH
`ENCODING CODE WORD.
`GET THE NEXT CODE
`WORD WHICH IS THE
`NUMBER OF REPEATED
`SYMBOLS AND
`REPEAT THE FIRST
`SYMBOL OF THE
`PREVIOUS CODE WORDS
`STRING THAT NUMBER
`OF TIMES INTO THE
`NEXT UNCOMPRESSED
`LOCATION IN THE
`OUTPUT
`
`
`
`
`
`
`
`UPDATE THE NEXT
`UNCOMPRESSED
`LOCATION POINTER BY
`
`
`THE NUMBER OF
`
`
`SYMBOLS PLACED
`
`
`INTO THE OUTPUT
`
`
`WHILE PROCESSING THE
`
`
`CODE WORD AND RUN
`
`
`LENGTH CODE WORDS
`
`
`
`U.S. Patent
`
`Oct. 26, 1999
`
`Sheet 9 of 9
`
`5,973,630
`
`778
`
`
`
`
`UPDATE THE NEXT UNCOMPRESSED
`LOCATION POINTER BY THE NUMBER
`OF SYMBOLS PLACED INTO THE
`OUTPUT WHILE PROCESSING THE
`CODE WORD
`
`
`
`
`
`
`BUILD AN EXTENSION STRING TO THIS
`CODE WORD STRING USING THE NEXT
`SEQUENTIAL DICTIONARY ENTRY
`
`720
`
`722
`
` PLACE THE POINTER TO THE LAST
`
`
`SYMBOL PLACED OR COPIED INTO
`THE NEXT UNCOMPRESSED LOCATION
`INTO THE LOCATION POINTER OF THE
`
`
`NEWLY CREATEéDN1S|;I'IY’IING EXTENSION
`
`726'
`
`ADD THE NUMBER OF STRING
`EXTENSION SYMBOLS, OR ONE IF
`A SINGLE SYMBOL CODE WORD,
`TO THE DEPTH OF THE PREVIOUS
`CODE WORD AND PLACE THE RESULT
`INTO DEPTH OF THE NEWLY CREATED
`STRING EXTENSION ENTRY.
`
`
`
`
`
`
`
`
`
`
`
`YES
`
`
`726
`
`ARE THERE
`
`MORE CODE WORDS TO
`
`PROCESS?
`
`
`
`
`
`FIG. 70
`
`”°
`
`9
`
`
`
`5,973,630
`
`1
`DATA COMPRESSION FOR USE WITH A
`COMMUNICATIONS CHANNEL
`
`This application is a division of application Ser. No.
`08/982,864 filed Dec. 2, 1997.
`BACKGROUND OF THE INVENTION
`
`The present invention relates to data compression (i.e.,
`creation of compressed data from uncompressed data) and
`decompression (i.e., recovery of the uncompressed data
`from the compressed data).
`Data compression systems are known in the prior art that
`compress a stream of digital data signals (uncompressed
`bits) into compressed digital data signals (compressed bits),
`which require less bandwidth (fewer bits) than the original
`digital data signals, and that decompress the compressed
`digital data signals back into the original data signals or a
`close approximation thereof. Lossless data compression
`systems decompress the compressed digital data signals
`back into the original data signals exactly. Thus, lossless
`data compression refers to any process that converts data
`into an alternative data form that requires less bandwidth,
`i.e., has fewer bits, than the data converted in a process that
`is reversible so that the original data can be recovered.
`Accordingly, the objective of data compression systems is
`to effect a savings in an amount of storage required to hold
`the data or the amount of time (or bandwidth) required to
`transmit the data. By decreasing required space for data
`storage or required time (or bandwidth)
`for data
`transmission, data compression results in a monetary and
`resource savings.
`Acompression ratio is defined as the ratio of the length of
`the data in the alternative data form (compressed data) to the
`length of the data originally (original data). Thus defined, the
`smaller the compression ratio, the greater will be the savings
`in storage, time, or bandwidth.
`If physical devices such as magnetic disks or magnetic
`tape are utilized to store the data, then a smaller space is
`required on the device for storing the compressed data than
`would be required for storing the original data, thereby, e.g.,
`utilizing fewer disks or tapes for storage. If telephone lines,
`satellite links or other communications channels are utilized
`
`for transmitting digital information, then lower costs, i.e.,
`shorter transmission times and/or smaller bandwidths, result
`when compressed data is employed instead of original data.
`Data compression systems can be made particularly effec-
`tive if the original data contains redundancies such as having
`symbols or strings of symbols appearing with high fre-
`quency. (In fact redundancies in the original data is a
`requirement for lossless data compression.) A data compres-
`sion system operating on original data containing redundan-
`cies may, for example, transform multiple instances of a
`symbol, or transform a string of symbols, in the original data
`into a more concise form, such as a special symbol or group
`of symbols indicating multiple occurrences of the symbol, or
`indicating the string of symbols, and thereafter translate or
`decompress the concise form back into the multiple
`instances of the symbol, or back into the string of symbols.
`For example, it may be desirable to transmit the contents
`of a daily newspaper via a satellite link or other communi-
`cations link to a remote location for printing. Appropriate
`sensors within a data compression system may convert the
`contents of the newspaper into a data stream of serially
`occurring characters for transmission via the satellite link. If
`the millions of bits comprising the contents of the daily
`newspaper were compressed before transmission and
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`decompressed at the receiver, a significant amount, e.g.,
`such as 50% or more, of transmission time (or bandwidth)
`could be saved.
`
`As a further example, when an extensive database such as
`an airline reservation database or a banking system database
`is stored for archival or backup purposes, a significant
`amount of storage space, such as 50% or more, can be saved
`if the database files are compressed prior to storage and
`decompressed when they are retrieved from storage.
`To be of practical and general utility, a digital data
`compression system should satisfy certain criteria.
`Specifically, one criterion is that the system should provide
`high performance,
`i.e., compression/decompression rates,
`for both compression and decompression with respect to the
`data rates in the communications channel being utilized, be
`it a data bus, a wired network, a wireless network or the like.
`In other words, data transmission rates seen by a sender of
`uncompressed data and a receiver of the uncompressed data
`should not be reduced as a result of compression/ decom-
`pression overhead. In fact, effective data rates achieved, may
`be significantly increased over slow communications
`channels, because more original data can be transmitted per
`unit time, if the original data is compressed preceding and
`following transmission, because there is less compressed
`data to transmit that there would have been original data.
`The rate at which data can be compressed (i.e.,
`the
`compression rate) is the rate at which the original data can
`be converted into compressed data typically specified in
`millions of bytes per second (megabytes/sec). The rate at
`which data can be decompressed (i.e., the decompression
`rate) is the rate at which compressed data can be converted
`back into original data. High compression rates and high
`decompression rates are necessary to maintain,
`i.e., not
`degrade, data rates achieved in present day disk, tape and
`communication systems, which typically exceed one
`megabyte/sec. Thus, practical data compression systems
`must typically have compression and decompression rates
`matching or exceeding some application-dependent
`threshold, e.g., one megabyte/sec.
`The performance of prior art data compression systems is
`typically limited by the speed of the random access memo-
`ries (RAM) and the like utilized to store statistical data and
`guide the compression and decompression processes. High
`performance compression rates and decompression rates for
`a data compression system can thus be characterized by a
`number of cycles (read and write operations) required per
`input character into or out of the data compression system.
`Fewer memory cycles per input character leads to higher
`performance compression rates and decompression rates.
`Another important criterion in the design of a data com-
`pression and decompression system is compression effec-
`tiveness. Compression effectiveness is characterized by the
`compression ratio of the system, i.e. a smaller compression
`ratio indicates greater compression effectiveness. However,
`in order for data to be compressible using a lossless data
`compression system, the data to be compressed must contain
`redundancies. As a result, the compression ratio, or com-
`pression effectiveness, in a lossless data compression system
`(and to a lesser degree in a lossy data compression system)
`is a function of the degree of redundancy in the data being
`compressed. The compression effectiveness of any data
`compression system is also affected by how effectively the
`data compression system exploits, for data compression
`purposes, the particular forms of redundancy in the original
`data.
`
`In typical computer stored data, e.g., arrays of integers,
`text, programs or the like, redundancy occurs both in the
`
`
`
`5,973,630
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`3
`repetitive use of individual symbology, e.g., digits, bytes or
`characters, and in frequent recurrence of symbol sequences,
`such as common words, blank record fields, and the like. An
`effective data compression system should respond to both
`types of redundancy.
`A further criterion important in the design of data com-
`pression and decompression systems is that of adaptability.
`Many prior art data compression procedures require prior
`knowledge, or the statistics, of the data being compressed.
`Some prior art procedures adapt to the statistics of the data
`as it is received, i.e., adaptive data compression systems, and
`others do not, i.e., non-adaptive data compressions systems.
`Where prior art procedures do not adapt to the statistics of
`the data as it
`is received, compression effectiveness is
`reduced, but where such procedures do adapt
`to the
`statistics, an inordinate degree of complexity is required in
`the data compression system. An adaptive data compression
`system may be utilized over a wide range of information
`types, which is typically the requirement in general purpose
`computer facilities while a non-adaptive data compression
`system operates optimally only on data types for which the
`non-adaptive data compression system is optimized. Thus, it
`is desirable that the data compression system achieves small
`compression ratios without prior knowledge of the data
`statistics, i.e., that the data compression system is adaptive.
`Many data compression systems currently available are
`generally not adaptable and so cannot be utilized to achieve
`small compression ratios over a wide range of data types.
`General purpose data compression procedures are known
`in the prior art that either are or may be rendered adaptive,
`two relevant procedures being the Huffman method and the
`Tunstall method. The Huifman method is widely known and
`used, reference thereto being had in an article by D. A.
`Huffman entitled “A Method for the Construction of Mini-
`mum Redundancy Codes”, Proceedings IRE, 40:10, pp.
`1098-1100 (September 1952). Further reference to the Huff-
`man procedure may be had in an article by R. Gallagher
`entitled “Variations on a Theme by Huifman”, IEEE Infor-
`mation Theory Transactions, IT-2426, (November 1978).
`Adaptive Huffman coding maps fixed length sequences of
`symbols into variable length binary words. Adaptive Huff-
`man coding suffers from the limitation that it is not ef[ica-
`cious when redundancy exists in input symbol sequences
`which are longer than the fixed sequence length the proce-
`dure can interpret. In practical implementations of the Huff-
`man procedure, the input sequence lengths rarely exceed 12
`bits due to RAM costs and, therefore, the procedure gener-
`ally does not achieve small compression ratios. Additionally,
`the adaptive Huffman procedure is complex and often
`requires an inordinately large number of memory cycles for
`each input symbol. Thus, the adaptive Huffman procedure
`tends to be undesirably cumbersome costly and slow thereby
`rendering the process unsuitable for most practical present
`day installations.
`Reference to the Tunstall procedure may be had in the
`doctoral
`thesis of B. T. Tunstall entitled “Synthesis of 55
`Noiseless Compression Codes”, Georgia Institute of
`Technology,
`(September 1967). The Tunstall procedure
`maps variable length input system sequences into fixed
`length binary output words. Although no adaptive version of
`the Tunstall procedure is described in the prior art, an
`adaptive version could be derived which, however, would be
`complex and unsuitable for high performance implementa-
`tions. Neither the Huffman nor the Tunstall procedure has
`the ability to encode increasingly longer combinations of
`source symbols.
`A further adaptive data compression system that over-
`comes some of the disadvantages of the prior art is that
`
`35
`
`40
`
`45
`
`50
`
`60
`
`65
`
`4
`disclosed in U.S. Pat. No. 4,464,650 for APPARATUS AND
`METHOD FOR COMPRESSING DATA AND RESTOR-
`
`ING THE COMPRESSED DATA, issued Aug. 7, 1984 to
`Cohen. The procedure of Cohen parses the stream of input
`data symbols into adaptively growing sequences of symbols.
`The procedure unfortunately, however, suffers from the
`disadvantages of requiring numerous RAM cycles per input
`character and utilizing time consuming and complex math-
`ematical procedures such as multiplication and division to
`effect compression and decompression. These disadvantages
`tend to render the Cohen procedure unsuitable for numerous
`economical high performance implementations.
`An even further adaptive data compression system that
`overcomes some of the disadvantages of the prior art is that
`disclosed in U.S. Pat. No. 4,558,302 for HIGH SPEED
`DATA COMPRESSION AND DECOMPRESSION APPA-
`RATUS AND METHOD, issued Dec. 10, 1985, to Welch.
`The procedure of Welch compresses an input stream of data
`symbols by storing,
`in a string table, strings of symbols
`encountered in an input stream. The Welch procedure next
`searches the input stream to determine the longest match to
`a stored string of symbols. Each stored string of symbols
`includes a prefix string and an extension character that is a
`last character in the string of symbols. The prefix string
`includes all but the extension character.
`
`When a longest match between the input data stream and
`the stored strings of symbols is determined, the code signal
`for the longest match is transmitted as the compressed code
`signal for the encountered string of symbols and an exten-
`sion character is stored in the string table. The prefix string
`of the extension character is the longest match,
`i.e., the
`longest stored string of symbols located in the search. The
`extension character of the extended string is the next input
`data character signal following the longest match.
`Searching through the string table and entering extension
`characters into the string table is effected by a limited
`searching hashing procedure. Unfortunately, even the
`improved data compression system of Welch suffers from
`less than optimal compression effectiveness, and less than
`optimal performance. As a result, the Welch procedure, like
`the Cohen procedure, is unsuitable for many high perfor-
`mance implementations.
`SUMMARY OF THE INVENTION
`
`The present invention advantageously improves upon the
`above-described approaches by providing a lossless data
`compression (i.e., creation of compressed data from uncom-
`pressed data) and decompression (i.e.,
`recovery of the
`uncompressed data from the compressed data) approach that
`improves on heretofore known data compression and
`decompression approaches.
`In one embodiment, the invention can be characterized as
`a method of compressing data for
`transmission over a
`communications channel. The method involves receiving a
`symbol, and at least one subsequent symbol; determining in
`a compression dictionary whether the symbol has a valid
`extension pointer; using, in the event the symbol does have
`a valid extension pointer,
`the valid extension pointer to
`access string extension symbols; determining, in the event
`the symbol does have a valid extension pointer, whether the
`string extension symbols equal the at least one subsequent
`symbol; determining in the compression dictionary, in the
`event the string extension symbols do not equal the at least
`one subsequent symbol, whether the symbol has a valid
`parallel pointer; repeating, in the event the symbol has a
`valid parallel pointer, the using step; repeating, in the event
`
`
`
`5,973,630
`
`5
`the string extension symbols equal the at least one subse-
`quent symbol, the determining of whether the symbol has a
`valid extension pointer; inserting, in the event the symbol
`does not have a valid extension pointer or in the event the
`symbol does not have a valid parallel pointer, a code word
`indicative of a longest string found into a compressed data
`stream; determining whether the longest string found was a
`single symbol; extending, in the event the longest string
`found was a single symbol, the longest string by one symbol;
`extending, in the event the longest string was not a single
`symbol,
`the longest string by one or more symbols;
`inserting, in the event the longest string found is extended by
`two or more symbols, a string extension signalling code
`word into the compressed data stream; and transmitting the
`compressed data stream through the communications chan-
`nel.
`In another embodiment, the invention can be character-
`ized as a method for decompressing data received over a
`communications channel. The method has steps of receiving
`a code word; determining whether the code word represents
`a single character; placing,
`in the event
`the code word
`represents a single character, the code word into an output
`data stream; determining, in the event the code word rep-
`resents more than a single character, whether the code word
`is in a dictionary; placing, in the event the code word is in
`the dictionary, a string defined by the code word into the
`output data stream; determining, in the event the code word
`is not in the dictionary, whether the code word is a next code
`word to be built; placing, in the event the code word is a next
`code word to be built, a string into the output data stream,
`the string being copied from a dictionary entry indicated by
`a previous code word processed; placing, in the event the
`code word is a next code word to be built, a first symbol of
`the string into the output data stream; and placing, in the
`event the code word is not in the dictionary and is not the
`next code word to be built, an extension string into the
`output data stream, the extension string being copied from
`the output data stream at a symbol following a last symbol
`of a dictionary entry indicated by a previous code word
`processed.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The above and other aspects, features and advantages of
`the present invention will be more apparent from the fol-
`lowing more particular description thereof, presented in
`conjunction with the following drawings wherein:
`FIG. 1 is block diagram illustrating a data compression
`system in accordance with one embodiment of the present
`invention;
`representation of a compression
`FIG. 2 is a tabular
`dictionary generated by the data compression system of FIG.
`1;
`
`FIG. 3 is a tabular representation of exemplary input (or
`original) data suitable for compression with the data com-
`pression system of FIG. 1;
`FIG. 4 is a tabular representation of exemplary com-
`pressed data generated by the data compression system of
`FIG. 1 in response to the input data of FIG. 3;
`FIG. 5 is a tabular representation of a decompression
`dictionary generated by the data compression system of FIG.
`1;
`
`FIGS. 6A, 6B and 6C are is flow charts illustrating steps
`traversed by the data compression system of FIG. 1 in order
`to compress an input data stream; and
`FIGS. 7A, 7B and 7C are flow charts illustrating steps
`traversed by the data compression system of FIG. 1 in order
`to decompress a compressed data stream.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`Corresponding reference characters indicate correspond-
`ing components throughout the several views of the draw-
`ings.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENTS
`
`The following description of the presently contemplated
`best mode of practicing the invention is not to be taken in a
`limiting sense, but
`is made merely for the purpose of
`describing the general principles of the invention. The scope
`of the invention should be determined with reference to the
`claims.
`
`Referring to FIG. 1, a block diagram is shown of func-
`tional components of a data compression system in 10
`combination with an input data stream 12, a data channel 14
`and an output data stream 16,
`in accordance with one
`embodiment of the present invention. Shown is the input
`data stream 12, a compressor 18, a compression dictionary
`20, the data channel 14 carrying compressed data, a decom-
`pressor 22, a decompression dictionary 24 and the output
`data stream 16. The compressor 18, the compression dictio-
`nary 20, the decompressor 22 and the decompression dic-
`tionary 24 together make up the data compression system
`10. Advantageously, the data compression 10 system may be
`implemented using a general purpose computer or a special
`purpose computer (or other processor-and-memory- con-
`taining system, such as a satellite transmitter and receiver,
`cellular base station and cellular telephone, or the like) and
`appropriate software subsystems.
`In accordance with the data compression system 10 of the
`present embodiment, the first, for example, 256 code words
`of, for example, 1024 possible assignable code words are
`reserved for 256 possible hexadecimal character code rep-
`resentations of an 8 bit byte character. For example, the first
`256 code words may be assigned to extended ASCII
`(American Standard Code for Information Interchange)
`symbols, EBCDIC (Extended Binary Coded Decimal for
`Interchange Code) symbols, or the like. Thus, in accordance
`with the present embodiment, 768 code words, of the 1024
`possible assignable code words, are available for assignment
`as dictionary entries to recurring strings of bytes (i.e.,
`redundancies) built as character patterns are encountered
`within data to be compressed, i.e., during compression of the
`input data stream 12, or code words are processed within
`already compressed data, i.e., during decompression of the
`compressed data stream 14. These 768 code words are built
`and stored in the compression dictionary 20 during com-
`pression and built and stored in the decompression dictio-
`nary 24 during decompression.
`The present embodiment stops building dictionary entries
`and extending strings when the 1024th code word is built,
`however, this is a matter of design choice and can be adapted
`to particular applications of the present embodiment. With
`particular sizes of data streams, optimal performance may be
`achievable by, for example, employing 2048, 4096 or more
`code words. The code words, however many, along with
`their assigned character bytes or recurring strings of bytes
`(or, more accurately,
`their assigned pointers to assigned
`character bytes or recurring strings of bytes within the input
`data stream 12) make up the compression dictionary 20,
`which is used to translate data to be compressed (original
`data, i.e., the input data stream) into compressed data (i.e.,
`the compressed data stream), and the decompression dictio-
`nary 24, which is used to translate the compressed data back
`into the original data (i.e., the output data stream).
`The data compression system employs structure for the
`compression dictionary 20 that is different from the structure
`
`
`
`5,973,630
`
`7
`of the decompression dictionary 24. The compression dic-
`tionary 20 is defined according to the following elements
`illustrated in FIG. 2:
`
`EXTENSION POINTER (100)—A pointer to a compres-
`sion dictionary entry that defines an “extension string” to a
`current string. There are 1024 extension pointers each asso-
`ciated with a compression dictionary entry, and each com-
`pression dictionary entry is associated with a code word. The
`first 256 compression dictionary entries, which are associ-
`ated with the first 256 code words, are reserved for the 256
`hexadecimal representations of an 8 bit byte, and only have
`an extension pointer, i.e., they do not have a parallel pointer
`(see below). The remaining 768 compression dictionary
`entries, associated with the remaining 756 code words, have
`an extension pointer and a parallel pointer.
`PARALLEL POINTER (102)—A pointer to a compres-
`sion dictionary entry that defines an “extension string” to a
`previous string, a parallel string, i.e., a string that starts with
`the same character or characters as the current string and,
`thus, has the same previous string entry as the current string.
`There are 768 parallel pointers, associated with code words
`256 through 1023.
`LOCATION POINTER (104)—A pointer, into a previ-
`ously compressed area of the input dat