`Storer
`
`mi
`
`[45]
`
`Patent Number:
`
`4,876,541
`
`Date of Patent:
`
`Oct. 24, 1989
`
`[541
`
`[75]
`
`[73]
`
`[21]
`
`I22]
`
`[51]
`[52]
`
`STEM FOR DYNANIICALLY CO G
`AND DECOMPRESSING EI.ECl'RON1C
`DATJL
`
`Inventor:
`
`Assignee:
`
`James A. Stores. Lincoln. Mass.
`
`Data Compression Corporation.
`Lexington, Mass.
`
`App}. No.: 1os.929
`Filed:
`Oct. 15, 1931
`Int. CL‘
`us. (.1.
`
`[53]
`
`Field of Search
`
`H03M 7/42
`341/51; 341/57,-
`341/105
`340/347 DD; 358/161;
`341/51, 65, 6?, 69. 106
`
`Compression"; Combinatorial Algorithms on Words;
`pp. 120-121; 1935.
`T. A. Welsh; "A Technique for High-Perforrnance
`Data Compression”; IEEE Computer, vol. 17:6; 1984.
`
`Primary Examiner—Wi]liam M. Shoop, Jr.
`A.1n':tantExnmDier—Marc 5. Hot!‘
`Attorney. Agent. or .Finn—-Schiller. Pandiscio & Kusme:
`
`[57]
`
`ABSTRACT
`
`A data compression system for encoding and decoding
`textual data, including an encoder for encoding the data
`and for a decoder for decoding the encoded data. Both
`encoder and decoder have dictionaries for storing fre-
`qutly-appearing strings of characters. Each string is
`identified by a unique pointer. The input data stream is
`parsed and matched with strings in the encoder dictio-
`nary using a novel matching algorithm. The pointer
`associated withthematched stringisthentransmitted to
`a remote location for storage or decoding. Thereafter,
`using a novel update algorithm the encoder dictionary
`isupdstedtoincludenewstrings ofdatabased on the
`matched string of data. If required. a novel deletion
`algorithm is employed to remove infrequently used
`strings of data to provide room for the newly generated
`strings of data. The strings of data may be arranged
`using a modified least recently used queue.
`The decoder matches each unique pointer in the stream
`of compressed input data with a corresponding pointer
`inthedecoderdicl:ionary.Thedecoderthcntransmits
`the string of character data associated with the matched
`pointer. thereby providing textual data in original. un-
`compressed form. Thereafter, using the novel update
`and deletion algorithms, new strings of data are added
`to. and old strings of data are deleted from. the decoder
`dictionary, so as to ensure both encoder and decoder
`dictionaries contain identical strings of data.
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4.405.650 S/1984 Eastman et aL .
`4,553,302 12/I935 Welch .
`4.612.532 9/1986 Bacon et all.
`
`................ 340/347 DD
`
`OTHER PUBLICATIONS
`
`D. A. Huffman; "A Method for the Construction of
`Minimum-Redundancy Codes"; Proceedings of the Ire;
`vol. 40. pp. 1098-1101; 1952.
`A Lempel et :3); "On the Complexity of Finite Sequen-
`ces“; IEEE Transactions of Information Theory; 22:1.
`pp. 75-81; 1976.
`V. S. Miller et al; “Variations on a Theme by Lempel
`and Ziv", Combinatorial Algorithms on Words, pp.
`131-140; 1985.
`J. B. Seery et al: “A Universal Data Compression Algo-
`rithm Deseripfion and Preliminary Results"; Technical
`Memorandum; 1977.
`J. B. Seery et al; "Further Results on Universal Data.-
`Cornpression"; Technical Memorandm; 1978.
`J. A. Storer et :11; “Data Compression Via Textual Sub-
`stitution"; Journal of the Association for Computing
`Machinery; vol. 29:4; 1932.
`J. A. Storer; "Textual Substitution Techniques for Data
`
`
`
`55 Chinis, 8 Drawing Sheets
`54
`56
`
`
`
`
`
`interrupt
`D0111mile?
`
`
`
`ORACLE 1 1 17
`
`
`
`US. Patent
`
`Oct. 24, 1939
`
`Sheet 1 of8
`
`4,876,541
`
`54-
`
`memory
`64K dynamic
`
`56 20-\
`
` memory
`controller
`
`
`
`U.S. Patent
`
`Oct. 24, 1939
`
`Sheet 2 of 3
`
`4,876,541
`
`IIO
`
`
`
`US. Patent
`
`Oct. 24, 1939
`
`Sheet 3 of 3
`
`4,876,541
`
`200
`
`
`
`
`
`transmit code for f
`
`204
`’
`
`202
`
`
`
`206
`
`(X empiyl or
`{{D full} 8: {DH(D}empty)}?
`no
`
`203
`
`deleie x from X
`
`2l6
`
`
`
`US.- Patent
`
`Oct. 24, 1939
`
`Sheet 4 of 3
`
`4,876,541
`
`
`
`louksize
`
`
`
`I ==
`
`230
`
`236
`
`cornpii] == psize
`raw{i1:= csize
`
`234
`
`output painter
`
`
`
`
` psize+cornpIi+j]
`{i'csize}* raw Li+}I
`
`442
`
`
`
`
`
`
`top/bottom < compfil/raw[i+j} ?
`
`compli) == mp
`rowlil == bcnflom
`
`/T95
`LLH BUFFER
`
`SWNG 2
`STRING 1
`r“-‘*—*“Y’““‘-'“’\“*'"””‘\
`
`* L
`
`——i——4
`umxmzz
`
`;—-———4
`k‘#___‘H~/___‘H*)
`MAXMATCH
`
`F/g. 6
`
`ubbbbub
`
`E97 7
`
`
`
`U.S. Patent
`
`Oct. 24, 1939
`
`Sheet 5 of8
`
`4,876,541
`
`MATCH
`T
`
`STRINGS ADDED
`
`H
`
`E
`_
`
`C
`
`A
`T
`_
`
`AT
`
`_
`TH
`
`E-
`
`CA
`R
`
`TH
`
`HE
`E-
`
`-c
`
`CA
`AT
`T-
`
`_A _AT
`
`AT-
`_T _TH
`
`THE THE-
`
`E_C E_CA
`CAR
`
`_AT
`E-
`THE-
`R
`
`AT
`
`R_A R_AT
`_ATE -ATE-
`E_T E_TH E_THE E_THE_
`THE_R
`
`RAT
`
`1
`
`2
`
`3
`4
`
`5
`
`6
`7
`8
`
`9
`
`IO
`II
`
`I2
`
`I3
`14
`
`I5
`:6
`:7
`I8
`
`19
`
`MATCH
`
`Lau DUEUE
`
`F zg. 8
`
`I
`
`T
`
`2 H
`3 E
`
`A _
`5 C
`5 A
`7 T
`
`3 _
`
`TH
`TH HE
`
`TH HE E-
`TH HE E- _C
`TH HE E- _C CA
`TH HE E- -C CA AT
`
`TH HE E- _C CA AT T.
`
`9 AT
`
`TH HE E- _C CA T- AT _AT _A
`
`no _
`ll TH
`
`:2 E-
`
`13 CA
`14 R
`:5 _AT
`
`TH HE E- _C CA T. AT- AT _AT _A
`HE E- _C CA T- AT- AT _AT _A TH _TH _T
`
`HE _C CA T- AT- AT _AT _A THE- THE TH _TH _T E-
`
`HE _C T- AT- AT _AT _A THE- THE TH _TH _T E_CA E_C E- CA
`-C T- AT- AT _AT .A THE- THE TH _TH _T E_CA E_C E- CAR CA
`AT THE- THE TH _TH _T E-CA E_C E; CAR CA _AT _A R_AT R-A R-
`
`I6 E_
`
`THE TH _TH _T E_CA E_C CAR CA _ATE_ _ATE _AT _A R_AT R_A R- E-
`
`:7 THE
`I8 -
`
`E_C CAR CA _ATE_ _ATE _AT _A R_AT R_A R- E_THE E_TH E-T E- THE TH
`CAR CA _ATE_ _ATE _AT _A R_AT R_A R- E_THE E_TH E_T E- THE- THE TH
`
`19 R
`20 A
`2: T
`
`CA _ATE_ _ATE _AT _A R_AT R_A R- E-THE E_TH E_T E- THE- THE TH _R
`_ATE_ _ATE _AT _A R_AT R_A R- E-THE E_TH E_T E- THE- THE TH _R RA
`_ATE _AT _A R_AT R_A R- E_THE E-TH E.T E- THE- THE TH _R RA AT
`
`Fig.
`
`.9
`
`
`
`U.S. Patent
`
`011.24, 1939
`
`Sheet 6 of8
`
`4,876,541
`
`300
`
`initialize D
`
`302
`
`
`
`
`
`
`
`
`receive code for tcmd
`convert to ihe current match!
`by dicfionary Iookup
`
`X == UH{D)
`
`30
`
`(X ernpw} or
`[{D full) 3| {DH(D}ernptyH?
`no
`
`308
`
`delete x from X
`
` 3E6
`
`
`
`US. Patent
`
`Oct. 24, 1939
`
`Sheet 7 of8
`
`4,876,541
`
`254
`
`252
`
`STRING
`POINTER
`
`
`
`
`
`
`U.S. Patent
`
`Oct. 24, 1939
`
`Sheet 8 of 3
`
`4,876,541
`
`
`
`downsize == currsize/2
`
`upsize *= curr5Ize'2
`
`
`
`4l4
`
`2
`
`
`
`down_size == down:_size:2
`currslze == currsaze _2
`
`upsize
`==upsize
`
`dOW|1§iZE == downgize/2
`currsize == currsnze /2
`upsize
`==upsize
`/2
`
`
`
`1
`
`4.876.541
`
`STEM FOR DYNAMICALI.Y COMPRESSIING AND
`DECOMPRESSING ELECTRONIC DATA
`
`The prest invention pertains to systems for encod-
`ing and decoding electronic data, and more particularly
`to systems for providing on-line, loss less compression
`and decompression of character data.
`Practical data compression methods have been
`known for several decades. For instance, an early
`method of compressing data was described by D. Huff-
`man in the article "A Method for the Construction of
`Minimum-Redundancy Codes". published in Proceed-
`ings of the IRE, vol. 40, pages 1098-1 llll. I952. Tradi-
`tional ofl'—linc data compression methods typically in-
`volve (I) scanning the data to gather statistical informa-
`tion about the nature of the data and (2) comprmsing
`the data based upon the statistical information gained
`during the scanning step. In traditional on-line data
`compression methods. a single p is made over the
`data and compression is effected based on information
`obtained during this single pass. If, in performing this
`single pass. compression is based only on some fixed
`information, then the method is said to be static. If, on
`the other hand, compression is based on information,
`gathered about the data from the data input stream
`which has already been compressed, the compression is
`said to be dynamic or adaptive.
`A theoretical background for lossless. dynamic, on-
`line compression of textual material was developed by
`A. Lempel and J. Ziv. and is described i their article
`"On Complexity of Finite Sequences", IEEE Transac-
`ricns on Information Theory. 22:l(l976), 75-31. J. B.
`Seery and I. Ziv experimented with practical imple-
`mentations of the data compression techniques devel-
`oped hy Lempel and Ziv. The results of the experiments
`by Seery and Ziv were disclosed in the articles "A
`Universal Data Compression Algorithm: Description
`and Preliminary Results", Technical Memorandum, Bell
`Laboratories, Murray Hill, N..'l., 1977. and "Further
`Results on Universal Data Compression“. Technical
`Memorandum. Bell Laboratories, Murray Hill. N.J..
`1918.
`The textual data oompron techniques of Lempel.
`Ziv and Seery involve storing frequently-appearing
`strings ofcharacters in memory. A pointer to the stored
`string of characters is transmitted in place of the full
`string when a new string of characters appears in the
`input stream that matches the stored string of charac-
`ters. When a string of characters appmrs in the input
`stream that matches a stored string of characters. but
`also includes a string of one or more new characters, the
`pointer for the matched string is transmitted along with
`the that character of the new string of characters. with
`;_he first character being transmitted in uncompressed
`orm.
`Although the data compression techniques of Lempel
`and Zlv have provably good irribr-rnation theoretic
`properties, the techniques typically require undesirably
`large amounts of computer memory and time. On the
`other hand, while the data compression techniques of
`Seery and Ziv require only a limited amount of -m
`ory. the power of the technique is limited inasmuch as
`once the memory of the data system used is full. no
`additional strings of characters can be added. As such,
`the Seery and Ziv technique does not provide optimal
`compression for frequently-changing arrangements of
`characters after the memory of the data system used is
`
`ill
`
`20
`
`25
`
`30
`
`35
`
`40
`
`$5
`
`60
`
`65
`
`2
`full. The data compression techniques of Lernpel. Ziv
`and Se-ery have been implemented in data compression
`systems of the type disclosed in U.S. Pat. No. 4,454,650
`to Eastman et al_.
`'1'. A. Welch developed data compression techniques
`that are similar to those of Sccry and Ziv. The Welch
`compression techniques are described in the article "A
`Technique for High-Performance Data Compression",
`IEEE Computer, vol. 17:6, p. 3-19, 1934. and in U.S.
`Pat. No. 4,558,302 to Welch. The Welch compression
`techniques sufl-‘er from the same drawbacks as the Scene
`and Ziv techniques. Specifically.
`in the Welch tech-
`niques the first eharacter of the new string of characters
`is transmitted in uncompressed form along with the
`matched string, with the result that less than optimal
`compression is achieved. Also. once the memory of the
`data system used is full, no new strings of data can be
`added. with the result that less than optimal compres-
`sion ia achievable when the arrangement of characters
`in the data input stream frequently changes.
`1. A. Storer and T. G. Szymanslri developed an off-
`line data compression technique that involved transmit-
`ting an index identifying a stored string of data in place
`of an identical string of data received in the data input
`stream. The Store: and Szymanslo’ technique differs
`from the technique of Lempel and Ziv in that the for-
`mer technique is off-line and is more concerned with the
`maintenance of dictionaries of stored strings including
`poittters identifying the strings, while the latter tech-
`nique is on-line and is oriented more toward the infor-
`mation theoretic properties of data compression. The
`Storer and Szymanslci
`technique is described in the
`article "Data Compression via Textual substitution“.
`Journal cfrhe Asraclationjhr Cornprrting Machinery, vol.
`29. No. 4, p. 923-951. 1982.
`V. S. Miller and M. N. Wegman developed another
`first-character growing strategy data compression tech-
`nique. The Miller/Wegman technique uses a least re-
`oently used queue for arranging stored strings of data
`for storage in memory in accordance with an identity
`(“ID") strategy. This ID strategy involves concatenat-
`ing the entire current match with the entire previous or
`last match. The Miller/Weg-man technique is described
`in the article "Variations on a theme by Lempel and
`Ziv", Combinatorial Algorithms on Words. Springer-
`Verlag (A. Apostolico and Z. Galil. editors), Berlin, p.
`131-140. I935.
`J. A. Storer developed a practical technique for corn-
`pressing data loosely based on the data compression
`technique theories disclosed by Lempel and Ziv and
`Storer and Szymanslri. The Storer technique is dis-
`closed in the article ‘Textual Suhstition Techniques for
`Data Contpresaio ", Combirtcrioncl Algorithm on
`Words (edited by A. Apostolico and Z. Galil), Springer-
`Verlag, Berlin, p. 120-121. 1935.
`In this known Storer technique, an encoder and de-
`coder are provided, each having a fixed, finite amount
`of memory. This memory, also referred to as a "dictio-
`nary", is adapted to contain a finite number of entries.
`Each entry has a unique pointer associated therewith.
`The dictionaries at the encoder and decoder may be
`initialized at the beginning of time to contain identical
`information. The encoder then repeats forever the fol-
`lowing steps:
`(1) find the longest string of characters in the input
`stream that matches an entry in the encoder dictionary;
`(2) transmit the pointer associated with the entry with
`which the match is made: and
`
`
`
`4.876.541
`
`then delete one of its
`
`3
`(3) update the dictionary;
`(a) if the dictionary is full
`entries;
`(b) add the current match.
`Similarly, the decoder repeats forever the following 5
`steps:
`(1) receive the pointer transmitted by the encoder and
`look up in the decoder dictionary the entry associated
`with the pointer;
`(2) output the entry associated with the pointer; and I0
`(3) update the dictionary
`(a) if the dictionary is lhll
`entries;
`(in) add the current match.
`The first step performed by the encoder amounts to a 15
`greedy algorithm for parsing the input stream. Storer
`indicates that better’ worst-case performance may be
`achieved by incorporating a limited amount of look-
`ahead in the first step performed by the encoder.
`Yet another system for dynamically compressing data
`is disclosed in U.S. Pat. No. 4,612,532 to Bacon et al.
`The compression system of Bacon er al does not rely on
`the compression theories developed by Lempel and
`Ziv. Instead, in the Bacon et al system a dynamically
`changing table is maintained based on conditional prob-
`abilities. Input data is compressed and decornpreased in
`accordance with information provided front the table.
`A primary object of the present invention is to pro-
`vide lossless textual data compression at relatively high
`rates of compression in real time for data provided over
`commercial serial data sources such as modems.
`Another object of the present invention is to provide
`a system for compressing textual data without losses
`according to novel algorithms for parsing and matching
`input data strings with strings in the dictionary of the 35
`system. including an algorithm for improving parsing of
`the input data stream by looking ahead at a limited
`amount of the data input stream.
`Yet another object of the present invention is to pro-
`vide a system for compressing textual data without
`losses according to learning algorithms for generating
`new strings to be added to the dictionary of the system
`that are distinct from. and novel relative to. the first
`character growing strategies of Lempel, Ziv, Seery and
`Welch and the ID growing strategy of Miller and Weg~
`man.
`
`4
`with which arnatch is made is then transmitted. This
`matching is effected using a conventional greedy algo-
`rithm. or alternatively, a novel limited-loolr ahead algo-
`rithm. The system is designed so that matches at least
`one character long can always be made with ones of the
`stored strings. As a result, a pointer is always transmit-
`ted after a match has been made, even if the match is
`one character long. This contrasts with the compression
`techniques of Seery. Ziv and Welch where matches
`cannot always be made, and therefore single characters
`are often transmitted in unoomprmsed form.
`The update learning algorithm adds N new strings of
`characters to the encoder and decoder dictionaries,
`where N equals the number of characters in the newly
`matched string of data characters Y received from the
`input data stream. The learning algorithm creates the
`firs: new string of characters by adding to the string of
`characters in the dictionary last or previously matched
`with a string of input data the lirst character in the
`string of characters Y. The second new string of charac-
`ters includes the lasbmatched string of characters plus
`the first two characters in the new string of characters
`Y. Successive new strings are similarly created until N
`new strings of characters have been created, with the
`last new string of characters consisting of the Imit-
`rnatched string of characters and the entire new string
`of characters Y. A modified version of this update
`learning algorithm is also disclosed herein, wherein the
`number of strings that can depend from a single non-
`root node of the dictionary is limited.
`The invention includes a deletion algorithm for at-
`ranging the dictionary strings in a modified least re-
`cently used queue, in accordance with a special reverse
`positioning procedure. Using this procedure, the most
`frequently used strings are positioned near the front of
`the queue and the least frequently used strings are posi-
`tioned near the end of the queue.
`The deletion algorithm deletes from the encoder
`dictionary the approximately lust recently used strings
`of characters to provide room for newly generated
`strings of characters. The deletion algorithm is designed
`to never delete the basic character set with which the
`encoder and decoder dictionaries are initialized and
`may be designed to never delete other characters or
`strings of characters.
`In one embodiment of the invention, both encoder
`and decoder comprise several dictionaries of varying
`lengths of memory. The dictionaries are operated in
`parallel with one another and the rates of compression
`achieved by the dictionaries are monitored. The dictio-
`nary achieving the best rate of compression is used so
`long as it out performs the other dictionaries.
`The present invention is a data compression system
`for compressing and decompressing a stream or se-
`quence of digital data character signals. The system
`comprises an encoder for compressing the input data
`signals. and a decoder for decompressing the com-
`pressed data signals. Both encoder and decoder have a
`read-write memory for storing a. “dictionary" or string
`table of frequently-appearing strings of characters.
`Each string is identified by a unique pointer or code.
`Compression is effected by parsing the data input
`stream into strings of data characters and then matching
`each of the parsed strings with a string in the encoder
`dictionary. The pointer associated with each matched
`string is transmitted in place of the string.
`Decompression is effected by matching each pointer
`tranmtted by the encoder with a correspondirtg
`
`then delete one of its
`
`invention is to
`Still another object of the present
`provide a system for compressing textual data without
`losses according to novel algorithms for arranging and
`deleting strings of data from the dictionary of the sys-
`tem. including the option of selectively “freezing” a set
`of strings in the dictionary.
`Yet another object of the present invention is to pro-
`vide a system for compressing textual data in which
`rates of data compression are maximized by dynami-
`cally varying the size the system's dictionaries.
`These and other objects of the present invention are
`achieved by a data compression system comprising a
`data compression module having an encoder for com-
`pressing and transmitting compressed textual data and a 60
`decoder for receiving and decotnpressing the com-
`pressed terttual data in accordance with novel learning
`algorithms. Both the coder and decoder comprise a
`dictionary in which a plurality of frequently-appearing
`strings of characters are stored. Each string is identified
`by a unique pointer. The encoder receives and matches
`strings of input data with strings of characters stored in
`the encoder dictionary. The pointer for the stored string
`
`55
`
`
`
`5
`pointer in the decoder's dictionary. The string in the
`decoder's dictionary associated with the matched
`pointer is then transmitted. completing the deoomprs-
`sion process.
`After each match is made, both the encoder and de-
`coder‘ dictionaries are updated to include new or reposi-
`tionedstringsofdatabasedonthedataenterlngthe
`dictionary. This updating is affected such that the dic-
`tionaries of the encoder and decoder contain identical
`strings.
`The present invention may be advantageously em-
`ployed in both the transmission and storage of elec-
`tronic data.
`
`25
`
`30
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 is a block diagram of the preferred embodi-
`ment of the present invention;
`FIG. 2 is an example of the contents of the string
`table or dictionary maintained in memory 22;
`FIG. 3 shows the modified trie organization of the
`data contained in the dictionary:
`FIG. 4 is a new diagram of the data compression
`algorithm used by data compression module 2|};
`Flchsisaflowdisgramoftltelimitedlookahead
`heuristic (LLH);
`FIG. 6 shows the LLB buffer;
`FIG. 7 is an example of an input stream;
`FIG. 8 illustrates the manner in which new strings of
`data are generated using the pure all~prel'nrcs heuristic
`(PAPH):
`FIG. 9 illustrated the manner in which strings of data
`are positioned and deleted using the modified least-
`recently—used heuristic (MLRUI-I);
`FIG. 10 is a flow diagram of the data decompression
`algorithm used by module 20;
`FIG. 11 is an example of the contents of dictionary
`150;
`FIG. 12 shows an example of dynamic dictionary size
`selection;
`FIG. 13 is a flow diagram illustrating the implementa-
`tion of the dynamic dictionary size selection procedure.
`Referring to FIG. 1. the present invention comprises
`one or more data compression modules 20. Each mod-
`ule 2|l performs both data compression and data decom-
`pression. Thus. for data storage only a single data com-
`pression module is required. For data transmission be-
`tween two separate stations, a data compression module
`20 must be provided at both the transmitting station and
`the receiving station. As a conceptual tooL compression
`will be described as occurring at an encoder and decom-
`pression will be described as occurring at a decoder. In
`practical implementation, the encoder and decoder are
`contained in a single data compression module 20.
`Data compression module 20 comprises dynamic
`memory 22. read-only memory 24, memory controller
`26 and microprocessor 28. Dynamic memory 22 is pref-
`erably a 64KB CMOS static RAM and is satisfactorily
`implemented using F166] AM chips. Read-only mem-
`ory 24 is preferably a 32KB CMOS-masked ROM.
`Memory 24 may be satisfactorily implemented rising an
`Ll-12325‘? chip. Memory controller 26 is a DMA con-
`troller that may be satisfactorily implemented using an
`BZC37 chip. Microprocessor 28 is preferably a 16-bit
`micro with a 3.9 mil: clock. The microprocessor may
`be satisfactorily implemented using an 8088C chip.
`Dynamic memory 22 is connected to memory con-
`troller 26 so that two-way data communication may be
`achieved therebetween. Read-only memory 24 is con-
`
`4. 876, S41
`
`6
`nected to memory controller 26 so that two-way data
`communication may be achieved therebetween. Mem-
`ory controller 26 is connected to microprocessor 28 so
`that two-way data communication may be achieved
`therebetween.
`Dynamic memory 22 is provided for holding input
`data, dictionary date, and other data used by micro-
`procwsor 28 for compressing and decompreseing input
`data. Read-only memory 24 is provided for storing the
`firmware used by microprocessor 28 for compressing
`and decompressing input data. Memory controller 26 is
`provided for controlling communication between rui-
`croproceasor 28 and dynamic memory 22 and micro-
`processor 28 and read only memory 24.
`Data compression module 20 also comprises interrupt
`controller 30, serial input/output port 32. serial input-
`/output port 34-, by-pass switch 36, pin connector 38.
`pin connector 40. X-switch 1-2 and ){-switch 44. Inter-
`rupt controller 30 may be satisfactorily implemented
`using a SZCS9 chip. Serial input/output ports 32 and 34
`may be satisfactorily implemented using 82C5lA or
`8250 ports. Bypass switch 34 is conventional a two-way
`6-pole switch. Pin connectors 33 and 40 are also con-
`ventional male 25 pin data communication connectors.
`Microprocessor 28 is connected to interrupt control-
`Ier 30 so that two-way data communication may be
`achieved therebetween. Preferably, an 8-bit wide data
`bus is used to couple microprocessor 28 with interrupt
`controller 30. Alternatively, to increase the rate of data
`u'ansmism'on (at an increased hardware cost), a 16-bit
`wide data bus may be ployed. Serial input/output
`port 32 is connected to interrupt controller 30 so that
`two-way data communication may he achieved therehe-
`tween. Serial input/output port 34 is connected to inter-
`rupt controller 3|.'I so that two-way data communication
`may be achieved therebetwcen. Preferably, an 8-bit
`wide data bus is used to couple interrupt controller 38
`with 1/0 ports 32 and 34. Alternatively. as discussed
`above. a 16-bit wide data bus may be employed for
`coupling the controller 30 with 1/0 ports 32 and 34».
`Serial input/output port 32 is connected by bus 46 to
`pin connector 38 so that two-way data communication
`may be achieved between port 32 and connector 38.
`The latter is adapted for attachment to a "host" or data
`device 50, such as a source of serial binary-encoded
`data. e.g. a microcomputer. or a terminal. Serial input-
`/output port 34 is connected by bus 48 to pin connector
`40 so that
`two-way data communication may be
`achieved between port 34 and connector 40. The latter
`is adapted for attachment to a communication link or
`data device 52, e_g. a modem or a secondary storage
`SOUTCC.
`
`X-switch 4:.’ is connected to the data input and data
`output pins of pin connector 38. }(-switch 44 is con-
`nected to the data input and data output pins of pin
`connector 40.
`Bypass switch 36 is connected to both but 46 and bus
`48.
`Interrupt controller 30 is provided for alternately
`coupling serial ports 32 and 34 with microprocessor 28.
`Bypass switch 36 is provided to allow a user of data
`compression module 20 to disable the module so that
`data will pass uncompressed through the module. It
`may be desirable to pass data uncompressed through
`module 20 when such data is to be unmanned to a
`source other than another data compression module 20
`and the user does not choose to disconnect the data
`device 50 from the data compression module. For in-
`
`
`
`4,876,541
`
`7
`stance. when a first data compression module 20 is used
`to communicate data to both a second data compression
`module and a second receiver not having a data com-
`pression module 211, the user will close bypass switch
`36, thereby disabling the compression operation of the
`first data compression module, whenever it is desired to
`transmit data in uncompressed form to the second re-
`ceivcr.
`Because the data-in and data-out pins of serial input-
`/output port 32 are often reversed with respect to the no
`corresponding pins on the output cable of data device
`50. ){-switch 42 is provided to ensure proper connection
`is achieved. By changing the operating state of X-
`switch 42, the data-in pin on connector 38 may be con-
`nected to the data-out pin of the output cable, or vice
`versa. as the piri-arrangement scheme of the data device
`requires. Clearly, a similar connection to serial input-
`/output port 32 may be made with other types of data
`devices so.
`As with X-switch 42, the operating state of ){-switch
`44 may be changed, if required, so that the data-in and
`data-out pins of data device 52 are correspondingly
`respectively connected to the data-in and data-out pins
`of serial input/output port 34.
`Clock 54 is connected to data compression module 10
`in known manner to control the timing of the various
`components of the module. as is well known in the art.
`Clock 54 may be implemented using any suitable semi-
`conductor cloclc chip.
`Power supply 56 is connected to clock 54 and data
`compression module 2|) in known manner to power the
`clock and the module. Power supply 56 is a conven-
`tional dc. power supply.
`The operation of data compression module 20 is con-
`trolled by instructions sent via data device 5|] prior to or
`during the transmission of input data from device Sit.
`For i.nstance. encode and decode signals may be pro-
`vided to module 20 via data device 5|} to instruct the
`module to compress or decompress the input data it
`receives. as described hereinafter. Similarly, data char-
`acters may be added to the module’s dictionaries for
`storage therein, as described hereinafter. using data
`device 50. Thus, operation of module 2|} may be ef-
`fected from a remote location, so long as data device 50
`is connected thereto.
`For data compression module 20 to function prop-
`erly, data device 50 must be capable of receiving and
`responding to an iron/roll‘ protocol. This protocol is
`used to disable the transmission of characters from de-
`vice 50, as when the device transmits characters at a
`faster band rated than data compression module 20 can
`accommodate. When module 20 cannot accept charac-
`ters at the rate transmitted by device 50, the module 20
`sends an roll‘ signal (typically ctrl/s, if ASCII charac-
`ters are being used) that when received causes the data 55
`device to terminate the transmission of characters.
`Module 20 transmits an iron signal (typically ctrl/q, if
`ASCII characters are used) when the module is ready
`toreceive characters, so as toinstruotdata device 50 to
`begin transmitting characters.
`Data compression module 20 transmits the son/soft‘
`protocol in accordance with instructions stored in read-
`only memory 24. Using these instructions and dynamic
`memory 22. microprocessor 28 transmits the son or soil’
`protocol signals to data device 50, as required.
`Data compression module 20 has four modes of oper-
`ation: a bypass mode. an encode mode. a decode mode
`and a simultaneous encode/decode mode. Module 10 is
`
`placed in the bypass, encode, decode or encode/decode
`modes by transmitting a bypass, encode, decode or
`encode/decode signal. respectively, from the data de-
`vice 50 at the beginning of transmission of the character
`data. Upon receipt of a bypass signal, read-only mem-
`ory 14 instructs module 2|] to pass input data through in
`uncompressed form. Upon receipt of the encode signal,
`read-only memory 21- accesses a data compression algo-
`rithm storcd therein that is used, as described hereinaf-
`ter. by module 2|] to compress input character data.
`Similarly, upon receipt of the decode signal. memory 24
`accesses a data decompression algorithm stored therein
`that is used, as dmcribed hereinafter. by module 20 to
`decompress compressed input character data. Likewise.
`when an encode/decode signal is received, memory 24
`simultaneously accesses the data compression and data
`decompression algorithms stored in the memory so as to
`simultaneously compress input character data and de-
`compress compressed input character data.
`In addition, module 20 may be placed in the bypass
`mode by closing switch 36. as noted above. Bypass
`switch 36 is provided to allow i.nput data to be passed
`through in uncompressed form when it is not desired to
`place module 20 in the bypass module using instructions
`transmitted from data device 50, e.g. when module 20
`does not respond to such instructions due to malfunc-
`tion of the module.
`
`MEMORY STRUCTIJRE
`
`Referring to FIG. 2, data compression and decom-
`pression Ls performed in data compression module 2|]
`using a string table or dictionary 1011 that is maintained
`in dynamic memory 22. The dictionary comprises a
`plurality of strings 102. Each string 102 comprises a
`collection of data character signals that represent one or
`more numbers or textual characters. The present inven-
`tion is adapted. to perform lossless compression with
`any data character format code.
`Theoretically. dictionary 100 may comprise an infi-
`nite number of strings 102. It has been determined,
`however,
`that satisfactory compression is achievable
`with data compression module 20 when the number of
`strings 102 provided in dictionary 100 ranges from be-
`tween 211(4096) to 215(65,536) strings. Preferably. dic-
`tionary Itltl comprises 21? strings 102. Dictionaries hav-
`ing lesser or greater numbers of strings, however. may
`also be satisfactorily employed.
`An initial character set is stored in dictionary 100
`comprising each character of the format code used. as
`well as other combinations of characters. Preferably.
`this initial character set has 256 elements, although
`character sets having a greater or lesser number of
`elements may also be satisfactorily employed. Thus, for
`instance, an
`character set having 256 elements
`might include the 128 characters of a known format
`code, as well as other frequently-appearing characters
`or combinations of characters. Each element of the
`initial character set constitutes a string 102. The strings
`that together contain the
`character set are p