throbber
(12) United States Patent
`Fallon et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,597,812 B1
`Jul. 22, 2003
`
`Usti06597812B1
`
`(54) SYSTEM ANI) Ml~Z'l'H0l) FOR i.ossi.i«:ss
`])A'[‘A C()MPR]¢:SS]()N AND
`
`0.195.024 Bl ~
`(i,439.9U2 B2 "'
`
`2:20:31 Fallon
`I2/2002 [Ieath
`
`341151
`341/'37
`
`DECOMPRESSION
`
`' cited by examiner
`
`(75)
`
`Inventors: James J. Fallon, Amionk, NY (US);
`slew“ L‘ B"! Ba.V5id°- NY (US)
`
`(73) A5-aignee: Rt-aliime Data, LLC, New York, NY
`(us)
`
`( x ) Nmicg:
`
`Subjecl K, any disclaimer’ [he 13",] 01-[his
`palum is cxwndcd or adjuslcd undcr 35
`U_S‘(j_ 1540,) by 0 days_
`
`21
`
`)
`1
`(22)
`
`A L N J 09 579
`922
`PP
`0
`I
`Filed:
`May 26, 2000
`
`1
`
`(50)
`
`Related U.S. Application Data
`i’f0Vi$i0IIai =1PPiiC€IlI'0It Nu.
`|‘>0tl3b.Sbl. Filed on May 28.
`1°99»
`Int. Ci.’ ................................................ .. coax 9/36
`(51)
`(52) U.S. Cl.
`........................ .. 382/232; 382/245; 341/51
`(58) Field of Search
`3821232-251;
`14”‘ l_79, 348 ,.4.,0 ”_4.,4 .,_ 375 ,.,4O_.,4l
`‘
`'
`' "’ '
`' "'
`'
`‘ ‘
`f ' '
`References cued
`_
`.
`..
`_
`U ‘i PATENT l)()C‘UMEN'l“§
`
`(56)
`
`p,.,-,,,,,,-y E_w,,,,,'m.,._Jinggc wu
`(74) Attorney, Agent‘, or Ft'mi—F. Chau & Associates. LLP;
`F
`l( V. D ‘R
`E.
`.
`L ma’
`gq
`
`Atis"t‘RAC'l'
`
`ran
`(57)
`
`Systems and methods for providing l0S.‘5lESS data compres-
`sion and decompression are disclosed which exploit various
`Characteristics of run-length encoding, parametric dictionary
`encoding. and bit packing to comprise an enoodingfdecoding
`process having an elfficiency that
`is suitable [or 1.156.!!!
`real-time lossless data C(ll'npt’L'SS](ln and decompression
`applications. In one aspect. a method for compressing input
`data comprising a plurality of data blocks comprises the
`steps of: detecting if the input data comprises a run-length
`sequence of data blocks; outputting an encoded run-length
`sequence. if a run-length sequence of data blocks is detected;
`maintaining «'3 dictionary comiiri-sing =i_p1_urality_of Cody‘
`w‘"dS‘.wh°re".' each ‘ml: wmd 1." the dlcllonary 15 a5S°°"
`aw.d mm a umquc dam Hock Slr'.ng;bl1"ld'"g a dam llmck
`string from at least one data block in the input data that is not
`part of a run-length sequence; searching for a code word in
`the dictionary having a unique data block string as_sociated
`therewith that matches the huill data block string; and
`out utiin
`t e e
`e wor
`re resentinvt e ')l.lll
`ata 3 oc
`p
`’ g h
`on
`cl
`p
`' 5 h
`I
`‘l
`cl
`I
`I
`k
`
`5,37rJ.03n A *
`5,883,975 A *
`5,955,970 A t
`
`341,-‘st
`211999 Franaszek el al.
`382/‘Z32
`3;‘l90‘) Narita el al.
`
`oxirioo Heath ........................ .. 341/87
`
`S'"”3'
`
`30 Claims, 6 Drawing Sheets
`
`Input
`Data Stream
`
`Run-Length Encoder
`
`:
`I
`I
`:
`
`13
`
`ENCODER
`
`r_____-_____.-.....-_--_.._....-__......--—-—-
`
`Encoded
`
`ORACLE 1101
`
`

`

`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 1 of 6
`
`US 6,597,812 B1
`
`n%8=m
`
`38:0
`
`9C.
`
`0+m:.Sml2.:
`
`59..
`
`Btsm
`
`:35
`
`Emfim£3
`
`_.mm:w_u_
`
`
`
`
`
`

`

`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 2 of 6
`
`US 6,597,812 B1
`
`Initialize Dictionary
`
`And Hash Table
`
`
`
`Initialize Pstring To
`Empty
`
`2°°
`
`201
`
`No
`
`Output Code
`Corresponding To
`Pstring
`
`226
`
`Are There
`Input Bytes
`To Process?
`
`~ 202
`
`Yes
`
`
`
`Read Next Input Byte
`And Store in C
`
`Check Next
`Consecutive
`
`Input Bytes
`
`Are There
`At Least s Consecutive
`
`
`
`Output‘ Code
`Denoting End
`°f mp” Data
`
`203
`For Pstring
`
`~22?
`
`Output Run Length
`Sequence
`
`Emmy
`
`
`
` Initialize Pstring To
`
`Matching Input Bytes
`For Run Length
`
`No
`
`Output Code
`
`208
`
`FIGURE 2A
`
`

`

`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 3 of 6
`
`US 6,597,812 B1
`
`Generate: Pstring+C
`
`210
`
`Search Dictionary
`For Pstn'ng+C
`
`211
`
`Store Dictionary lndex
`Yes Of Matching String in
`Mcode
`
`Set Pstring =
`Pstring+C
`
`
`
`ls Pstring+C
`in Dictionary?
`
`
`
`212 ~
`
`No
`
`
`
`Output Code for
`Pstring
`
`215
`
`Create Dictionary Entry
`For Pstn'ng+C
`
`216
`
`
`
`213
`
`214
`
`Reset Dictionary To
`Initial State
`
`220
`
`Reset Hash Table
`
`Output Code Word
`Indicating Dictionary
`initialization
`
`222
`
`Search Dictionary To
`Find Entry for Pstring
`
`224
`
`Yes
`
`
`
`
`
`
`Would Addition
`
`
`
`
`No
`
`Add New Entry To
`End Of Dictionary
`
`Update Hash Table
`
`Set Pstfing = C
`
`
`
`Of Entry To Dictionary
`Exceed Threshold?
`
`
`
`
`
`
`
`
`Store Dictionary
`Index of Matching
`Entry in Mcode
`
`
`
`FIGURE 2B
`
`

`

`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 4 of 6
`
`US 6,597,812 B1
`
`S9...98
`
`bm:o_§n_
`
`Buoouo
`
`nmanor.
`
`3.83
`
`.330
`
`
`
`Euoomo5u:mJ_-::m_
`
`
`
`a%_o8__5._m
`
`Sac.
`
`._ot=m
`
`nmboocm
`
`EmmémEma
`
`
`
`
`
`

`

`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 5 of 6
`
`US 6,597,812 B1
`
`Initialize Dictionary
`
`Initialize Pstring And
`Cstring to Empty
`
`
`
`Read Code Word From
`Encoded Input Stream
`And Store It in Ccode If Ccode =2,
`End Decoding
`Process
`
`
`
`
`
`
`
`
`
`Output Decoded Characters
`Corresponding To Run
`Length Encoded Sequence
`
`
`
`If Ccode=1, Process Next
`Successive Run Length
`Encoded Bytes
`In Encoded Input Stream
`
`Lookup Cstring
`(Dictionary Entry For
`
`Ccode) and Output It
`
`408
`
`If Ccode=0,
`Return to Step 400
`
`FIGURE 4A
`
`

`

`U.S. Patent
`
`Jul. 22, 2003
`
`Sheet 6 of 6
`
`US 6,597,812 B1
`
`Set Pcode = Ccode
`
`409
`
`Read Next Code From
`Encoded Input Steam And
`
`410
`
`Store It in Ccode
` If Ccode=2,
`
`End Decoding Process
`
`412
`
`If Ccode=1, Process Next
`Successive Run Length
`In Enigggefigxtgream
`
`Output Decoded Characters
`Corresponding to Run
`Length Encoded Sequence
`
`Return to Step 400
`
`If Ccode=0,
`
`Yes
`
`
`
`
`'5
`Ccode a
`Control Code?
`411 «-
`
`
`
`
`No
`
`To Dictionary
`
`
`Is Ccode in
`
`dictionary?
`
`No
`
`Lookup Cstn'ng
`(Dictionary Entry for
`Ccode) And Output It
`
`Take First Character
`From Cstring And
`Store It in C
`
`Add Pstn'ng+C
`
`417
`
`418
`
`419
`
`Store First Character
`Of Cstring in C
`
`To Dictionary
`
`Add Pstn'ng+C
`
`Output Pstn'ng+C
`
`420
`
`421
`
`422
`
`FIGURE 4B
`
`

`

`US 6,597,812 B1
`
`1
`SYSTEM AND METHOD FOR LOSSLESS
`DATA COMPRESSION AND
`DECOMPRESSION
`
`CROSS-REFERENCE TO RELATED
`APPLICATION
`
`This application is based on provisional application U.S.
`Application Ser. No. 60/136,561 filed on May 28, 1999.
`
`BACKGROUND
`
`1. Technical Field
`
`The present invention relates generally to data compres-
`sion and decompression and, more particularly to systems
`and methods for providing lossless data compression and
`decompression using a combination of dictionary and run
`length encoding.
`2. Description of Related Art
`Information may be represented in a variety of manners.
`Discrete information such as text and numbers are easily
`represented in digital data. This type of data representation
`is known as symbolic digital data. Symbolic digital data is
`thus an absolute representation of data such as a letter,
`figure, character, mark, machine code, or drawing.
`Continuous information such as speech, music, audio,
`images and video frequently exists in the natural world as
`analog information. As is well-known to those skilled in the
`art, recent advances in very large scale integration (VLSI)
`digital computer technology have enabled both discrete and
`analog information to be represented with digital data.
`Continuous information represented as digital data is often
`referred to as diffuse data. Diffuse digital data is thus a
`representation of data that is of low information density and
`is typically not easily recognizable to humans in its native
`form.
`
`There are many advantages associated with digital data
`representation. For instance, digital data is more readily
`processed, stored, and transmitted due to its inherently high
`noise immunity. In addition, the inclusion of redundancy in
`digital data representation enables error detection and/or
`correction. Error detection and/or correction capabilities are
`dependent upon the amount and type of data redundancy,
`available error detection and correction processing, and
`extent of data corruption.
`One outcome of digital data representation is the continu-
`ing need for increased capacity in data processing, storage,
`retrieval and transmittal. This is especially true for diffuse
`data where continuing increases in fidelity and resolution
`create exponentially greater quantities of data. Within the
`current art, data compression is widely used to reduce the
`amount of data required to process, transmit, store and/or
`retrieve a given quantity of information. In general, there are
`two types of data compression techniques that may be
`utilized either separately or jointly to encode and decode
`data: lossy and lossless data compression.
`Lossy data compression techniques provide for an inexact
`representation of the original uncompressed data such that
`the decoded (or reconstructed) data differs from the original
`unencoded/uncompressed data. Lossy data compression is
`also known as irreversible or noisy compression. Negent-
`ropy is defined as the quantity of information in a given set
`of data. Thus, one obvious advantage of lossy data com-
`pression is that the compression ratios can be larger than that
`dictated by the negentropy limit, all at
`the expense of
`information content. Many lossy data compression tech-
`niques seek to exploit various traits within the human senses
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`to eliminate otherwise imperceptible data. For example,
`lossy data compression of visual imagery might seek to
`delete information content in excess of the display resolution
`or contrast ratio of the target display device.
`On the other hand, lossless data compression techniques
`provide an exact representation of the original uncom-
`pressed data. Simply stated, the decoded (or reconstructed)
`data is identical to the original unencoded/uncompressed
`data. Lossless data compression is also known as reversible
`or noiseless compression. Thus, lossless data compression
`has, as its current limit, a minimum representation defined
`by the negentropy of a given data set.
`It is well known within the current art that data compres-
`sion provides several unique benefits. First, data compres-
`sion can reduce the time to transmit data by more efficiently
`utilizing low bandwidth data links. Second, data compres-
`sion economizes on data storage and allows more informa-
`tion to be stored for a fixed memory size by representing
`information more efficiently.
`A rich and highly diverse set of lossless data compression
`and decompression algorithms exist within the current art.
`These range from the simplest “adhoc” approaches to highly
`sophisticated formalized techniques that span the sciences of
`information theory, statistics, and artificial intelligence. One
`fundamental problem with almost all modern approaches is
`the compression ratio verses the encoding and decoding
`speed achieved. As previously stated, the current theoretical
`limit for data compression is the entropy limit of the data set
`to be encoded. However, in practice, many factors actually
`limit the compression ratio achieved. Most modern com-
`pression algorithms are highly content dependent. Content
`dependency exceeds the actual statistics of individual ele-
`ments and often includes a variety of other factors including
`their spatial location within the data set.
`Within the current art there also presently exists a strong
`inverse relationship between achieving the maximum
`(current) theoretical compression ratio, referred to as “algo-
`rithmic effectiveness”, and requisite processing time. For a
`given single algorithm the “effectiveness” over a broad class
`of data sets including text, graphics, databases, and execut-
`able object code is highly dependent upon the processing
`effort applied. Given a baseline data set, processor operating
`speed and target architecture, along with its associated
`supporting memory and peripheral set, “algorithmic efli-
`ciency” is defined herein as the time required to achieve a
`given compression ratio. Algorithmic efficiency assumes
`that a given algorithm is implemented in an optimum object
`code representation executing from the optimum places in
`memory. This is virtually never achieved in practice due to
`limitations within modern optimizing software compilers. In
`addition, an optimum algorithmic implementation for a
`given input data set may not be optimum for a different data
`set. Much work remains in developing a comprehensive set
`of metrics for measuring data compression algorithmic
`performance, however for present purposes the previously
`defined terms of algorithmic effectiveness and efficiency
`should suffice.
`
`Of the most widely utilized compression techniques,
`arithmetic coding possesses the highest degree of algorith-
`mic effectiveness but, as expected, is the slowest to execute.
`This is followed in turn by dictionary compression, Huffman
`coding, and run-length coding techniques with respectively
`decreasing execution times. What is not apparent from these
`algorithms,
`that is also one major deficiency within the
`current art,
`is knowledge of their algorithmic efficiency.
`More specifically, given a compression ratio that is within
`
`

`

`US 6,597,812 B1
`
`3
`the effectiveness of multiple algorithms, the question arises
`as to their corresponding efficiency on various data sets.
`SUMMARY OF THE INVENTION
`
`if a run-
`
`The present invention is directed to systems and methods
`for providing lossless data compression and decompression.
`The present
`invention exploits various characteristics of
`run-length encoding, parametric dictionary encoding, and
`bit packing to comprise an encoding/decoding process hav-
`ing an efficiency that is suitable for use in real-time lossless
`data compression and decompression applications.
`In one aspect of the present
`invention, a method for
`compressing input data comprising a plurality of data blocks
`comprises the steps of:
`detecting if the input data comprises a run-length
`sequence of data blocks;
`outputting an encoded run-length sequence,
`length sequence of data blocks is detected;
`maintaining a dictionary comprising a plurality of code
`words, wherein each code word in the dictionary is
`associated with a unique data block string;
`building a data block string from at least one data block
`in the input data that
`is not part of a run-length
`sequence;
`searching for a code word in the dictionary having a
`unique data block string associated therewith that
`matches the built data block string; and
`outputting the code word representing the built data block
`string.
`In another aspect of the present invention, the dictionary
`is dynamically maintained and updated during the encoding
`process by generating a new code word corresponding to a
`built data block string, if the built data block string does not
`match a unique data block string in the dictionary, and then
`adding the new code word in the dictionary.
`In yet another aspect of the present invention, the dictio-
`nary is initialized during the encoding process if the number
`of code words (e.g., dictionary indices) in the dictionary
`exceeds a predetermined threshold. When the dictionary is
`initialized, a code word is output in the encoded data stream
`to indicate that the dictionary has been initialized at that
`point
`in the encoding process. An initialization process
`further comprises resetting the dictionary to only include
`each possible code word corresponding to a unique data
`block string comprising a single data block. By way of
`example, if each data block comprises a byte of data, there
`will be 256 possible code words for a data block string
`comprising a single byte. In this instance, the dictionary
`reset to its initial state will comprise 256 entries.
`In another aspect of the present invention, the dictionary
`further comprises a plurality of control code words, wherein
`a control code word is designated to represent a dictionary
`initialization, a run-length encoded sequence, and the end of
`the input data (or completion of the encoding process).
`These control words are used in the decoding process to
`re-create the input data.
`In yet another aspect of the present invention, a bit-
`packing process is employed to pack the bits of successive
`output code words representing encoded run-length
`sequences and data block strings.
`In another aspect of the present invention, a method for
`decompressing an encoded data stream comprising a plu-
`rality of code words, which is generated using the encoding
`method, comprises the steps of:
`maintaining a dictionary comprising a plurality of code
`words utilized to generate the encoded data stream,
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`wherein the code words in the dictionary comprise
`control code words and code words that are each
`
`associated with a unique data block string;
`decoding and outputting a run-length sequence of data
`blocks associated with an input code word of the
`encoded data stream, if the input code word is a control
`code word in the dictionary that indicates an encoded
`run-length sequence;
`outputting a unique data block string in the dictionary that
`is associated with an input code word of the encoded
`data stream, if the input code word is found in the
`dictionary; and
`if the input code word is not found in the dictionary,
`building a new data block string comprising (1) the
`unique data block string associated with a previous
`control word found in the dictionary and (2) the first
`data block of the unique data block string, adding the
`new string to the dictionary, and outputting the new
`string.
`These and other aspects, features and advantages of the
`present invention will become apparent from the following
`detailed description of preferred embodiments, which is to
`be read in connection with the accompanying drawings.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block diagram of a system for providing
`lossless data compression according to an embodiment of
`the present invention;
`FIGS. 2a and 2b comprise a flow diagram of a method for
`providing lossless data compression according to one aspect
`of the present invention;
`FIG. 3 is a block diagram of a system for providing
`lossless data decompression according to an embodiment of
`the present invention; and
`FIGS. 4A and 4B comprise a flow diagram of a method for
`providing lossless data decompression according to one
`aspect of the present invention.
`DETAILED DESCRIPTION OF PREFERRED
`EMBODIMENTS
`
`The present invention is directed to systems and methods
`for providing lossless data compression and decompression.
`It is to be understood that the present invention may be
`implemented in various forms of hardware, software,
`firmware, or a combination thereof. In particular, the present
`invention may be implemented in hardware comprising
`general purpose microprocessors, digital signal processors,
`and/or dedicated finite state machines. Preferably,
`the
`present invention is implemented as an application program,
`tangibly embodied on one or more data storage mediums,
`which is executable on any machine, device or platform
`comprising suitable architecture. It is to be further under-
`stood that, because the present invention is preferably imple-
`mented as software, the actual system configurations and
`process flow illustrated in the accompanying Figures may
`differ depending upon the manner in which the invention is
`programmed. Given the teachings herein, one of ordinary
`skill in the related art will be able to contemplate these and
`similar implementations or configurations of the present
`invention.
`
`Data Compression
`Referring now to FIG. 1, a block diagram illustrates a
`system 10 for providing lossless data compression according
`to an embodiment of the present invention. In general, the
`data compression system 10 comprises an input buffer 11 for
`
`

`

`US 6,597,812 B1
`
`5
`temporarily buffering an input data stream and an encoder 12
`for compressing the input data stream. It is to be understood
`that the compressed data stream output from the encoder
`may,
`for example, be stored in a storage medium for
`subsequent retrieval and decoded using a decompression
`method described below, or transmitted over a local or
`global computer network (for purposes of increased band-
`width transmission) and decompressed at a desired location.
`It is to be further understood that the input buffer 11 is an
`optional component that may be employed, for example, in
`real-time compression applications where the rate of com-
`pression of the encoder 12 is slower than the bandwidth of
`the input data stream.
`In general, the encoder 12 employs a unique combination
`of compression techniques preferably including run-length
`encoding and hash table dictionary encoding to compress an
`input data stream, as well as bit-packing to increase the final
`compression ratio. More specifically, the encoder 12 com-
`prises a run-length encoder 13 and dictionary encoder 14,
`both of which utilize a code word dictionary 15 to output one
`or more “code words” representing a “character string”
`identified by the respective encoder 13, 14 in the input data
`stream. It is to be understood that the term “character” as
`
`used herein refers to an input byte of data that can take on
`any one of 256 values, and the term “string” as used herein
`refers to a grouping of one or more characters (bytes).
`Furthermore, as described in further detail below,
`in a
`preferred embodiment, a “code word” for a given character
`string comprises a dictionary index (denoted herein as D[i])
`of the character string in the dictionary 15.
`During an encoding process in which bytes of data in the
`input stream are input to the encoder 12,
`the run-length
`encoder 13 will identify a run-length sequence in the data
`stream,
`i.e., a character string comprising a plurality of
`consecutively similar characters (bytes), and output one or
`more code words from the dictionary 15 to represent the
`run-length sequence (as explained in detail below).
`Moreover, the dictionary encoder 14 will build a character
`string comprising two or more characters (which does not
`comprise a run-length sequence), search the dictionary 15
`for a code word that corresponds to the character string, and
`then output the code word representing the character string.
`In addition,
`if the character string that
`is built by the
`dictionary encoder 14 does not match a character string in
`the dictionary 15, the dictionary encoder 14 will cause the
`character string to be added to the dictionary and a new code
`word (e.g., dictionary index) will be associated with that
`string. An encoding process according to one aspect of the
`present invention will be described in detail below with
`reference, for example, to the flow diagram of FIGS. 2A and
`2B.
`
`The encoder 12 utilizes a plurality of data storage struc-
`tures 16 for temporarily storing data during an encoding
`process. For example, in the illustrative embodiment of FIG.
`1, a Pstring data structure 17 is employed for temporarily
`storing a working character string, Pstring. A C data struc-
`ture 18 is employed for temporarily storing a next character
`(byte) C in the input stream. In addition, a Pstring+C data
`structure 19 is used for temporarily storing a character string
`Pstring+C which is a string comprising all of the characters
`in Pstring plus the character in C. Moreover, an Mcode data
`structure 23 is used for temporarily storing a code word
`(Mcode) (e.g., dictionary index) corresponding to a previous
`successful string match in the dictionary. The use of these
`data structures will be discussed in further detail below.
`
`The code word dictionary 15 comprises a plurality of
`dictionary indices D[i], wherein each index in the dictionary
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`
`to either a
`15 is mapped (via a mapping module 20)
`predefined control code or a different code word correspond-
`ing to a character (byte) string. The mapping module 20
`preferably employs a hash function to, inter alia, map each
`character string (e.g., strings of one or more bytes) into a
`unique index D[i]
`in the dictionary 15 (although other
`mapping techniques known to those skilled in the art may be
`employed). As indicated above, in a preferred embodiment,
`the dictionary indices D[i] are output as the “code words”
`(also referred to herein as “Mcodes”)by the encoder to create
`an encoded file. These code words are processed by a
`decoder to decompress an encoded file (as discussed below
`with reference to FIGS. 3, 4a and 4b.)
`In a preferred embodiment, the first three entries in the
`dictionary 15, indices D[0], D[1], and D[3], are reserved as
`control codes. In particular, the entry for the dictionary index
`D[0], or code word “0”, is output to indicate (to the decoder)
`that the dictionary 15 has been reset to its initial state. As
`explained in detail below, the dictionary 15 is preferably
`reset at the commencement of an encoding process before a
`new input stream is processed and, preferably, during an
`encoding process when the total number of entries D[i] in
`the dictionary 15 exceeds a predetermined limit. In addition,
`the dictionary index D[1], or code word “1”, is utilized for
`the run-length encoding process. More specifically, the code
`word “1” is output to indicate that the next two consecutive
`output numbers (in the encoded sequence) represent a run-
`length encoding sequence comprising (1) a character code
`and (2) a number denoting the amount of consecutive
`characters found in the data stream corresponding to the
`character code. Furthermore, the dictionary index D[2], or
`code word “2” is output to indicate the end of the data stream
`and completion of the encoding process.
`index
`The next 256 entries in the dictionary 15 (i.e.,
`numbers 3 through 258) each comprise a single character
`sting (e.g., one byte) corresponding to one of the 256
`possible character codes. Accordingly,
`in a preferred
`embodiment, the dictionary indices D[0] through D[258] are
`the only entries that exist in the dictionary 15 upon initial-
`ization of the dictionary 15. Any additional character strings
`that are dynamically added to the dictionary 15 by the
`dictionary encoder 14 during an encoding process will be
`consecutively added beginning at index D[260].
`It is to be appreciated that, as indicated above, for a given
`character string under consideration,
`the encoder 12 will
`output (as a code word) the dictionary index number D[i]
`corresponding to a matching character string. Since the
`dictionary index number is usually less than two bytes and
`the input character strings are typically longer than six bytes,
`the reduction in the number of bits output can be significant.
`In one embodiment of the present invention, the dictio-
`nary encoder 14 can search the code word dictionary 15 for
`a matching character string therein by comparing each entry
`in the dictionary 15 to the input character string under
`consideration. In certain instances, however, the amount of
`entries D[i]0 in the dictionary 15 can increase significantly,
`potentially rendering this search process slow, inefficient
`and computationally intensive. Accordingly, the data com-
`pression system 10 preferably comprises a hash table 21
`which is utilized by the dictionary encoder 14 during an
`encoding process to reduce the search time for finding a
`matching character string in the dictionary 15.
`More specifically, in one embodiment, the hash table 21
`comprises a plurality of arrays Array[N], wherein each array
`comprises every dictionary index number D[i] in the dic-
`tionary 15 having an entry (i.e., character strings) that begins
`
`

`

`US 6,597,812 B1
`
`7
`with a character code corresponding to the array index. For
`example, the third hash table array Arrary[3] comprises all
`the dictionary indices D[i] having a dictionary entry in
`which the first character (byte) of the string has decimal
`value of “three.” In the preferred embodiment where the
`encoder processes individual bytes of data in the input
`stream, since there are 256 possible characters, there are 256
`arrays, i.e., Array[N], where N=1 .
`.
`. 256. Advantageously,
`the use of the hash table 21 for finding matching strings in
`the dictionary reduces the number of string comparisons by
`256.
`
`In another embodiment, the hash table 21 comprises a
`plurality of nested hash tables. For example, a first level of
`hashing can use the first character to subdivide the dictio-
`nary 15 into 256 sub-dictionaries and a second level of
`hashing may use the 2”“ character of the input string to
`further subdivide each of the initial 256 entries. Each
`
`additional level of hashing subdivides each dictionary into
`an additional 256 sub-dictionaries. For example, 2 levels of
`hashing yields 25 62 sub-dictionaries and n levels yields 25 6”
`sub-dictionaries. The purpose of this hashing function is to
`reduce the time for searching the dictionary 15. For
`example, using an n level hashing scheme reduces the search
`time by 256”—(n*256).
`Furthermore, as explained in detail below with reference
`to the process depicted in FIGS. 2a and 2b, the hash table is
`dynamically modified to incorporate new entries D[i] that
`are added to the dictionary 15 during the encoding process.
`In addition, the data compression system 10 optionally
`comprises a bit packing module 22 for providing additional
`compression of the encoded data stream. As explained
`above, the maximum size (i.e., number of entries D[i]) of the
`dictionary 15 is predefined and, consequently, the maximum
`number of bits of information needed to represent any index
`in the dictionary 15 is known a priori. For example, if the
`maximum dictionary size is 4000 entries, only 12 bits are
`needed to represent any index number. Since data is typi-
`cally transferred in groups of 8 or 16 bits, in the above
`example where 12 bits maximum are need to represent the
`index number, 4 bits out of every 16 bits would be wasted.
`Accordingly,
`to provide additional compression,
`the
`encoder 12 preferably implements the bit-packing module
`22 to pack the bits of successive output code words. It is to
`be understood that any suitable bit-packing technique known
`to those skilled in the art may be employed. In a preferred
`embodiment, the bit-packing module employs a shift regis-
`ter to output at least 16 bits of data when the data is ready
`for output. By way of example, assume a 12-bit code word
`is initially input to the shift register. The next 12-bit code
`word that is output is also placed in the shift register, and the
`shift register would contain 24 bits of information. Then, 16
`bits would be output from the shift register, leaving 8 bits
`remaining. When the next 12-bit code word is input to the
`shift register, the shift register will contain 20 bits, and 16
`will be output. This bit packing process is repeated for every
`output code word until the encoding process is complete.
`Advantageously, the bit packing process according to the
`present invention improves the compression by a factor of
`15/12, or 1.33. Moreover,
`it is to be appreciated that the
`processing time required for the bit-packing is negligible.
`Consequently, the bit packing process provides increased
`compression (“algorithmic effectiveness”) without a signifi-
`cant
`increase in processing overhead (“algorithmic
`efficiency”).
`Referring now to FIGS. 2a and 2b, a flow diagram
`illustrates a method for compressing data according to one
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`8
`aspect of the present invention. In particular, the encoding
`process depicted in FIGS. 2a and 2b illustrates a mode of
`operation of the system 10 of FIG. 1. Initially, the dictionary
`15 and hash table 21 are initialized (step 200). For example,
`as noted above, the dictionary 15 is initialized to include 259
`entries, i.e., the first three entries D[0]—D[2] comprise the
`control codes and the next 256 entries D[3]—D[259] com-
`prise the 256 possible character codes (assuming, of course,
`that the encoder processes data blocks each comprising a
`byte). Furthermore, the hash table will be initialized such
`that each array Arrary[1]—[N] comprises one entry—the
`dictionary index D[i] for the corresponding character code.
`Next, the Pstring data structure 17 (or “Pstring”)is initialized
`to be empty (i.e., it contains no characters at initialization)
`(step 201). It is to be understood that neither the C data
`structure 18 (or “C”) nor the Mcode data structure 23 (or
`“Mcode”) require initialization.
`After the initialization process, a determination is made as
`to whether there are any input characters for processing (step
`202). If there is input data (affirmative result in step 202), the
`first (or next) character (e.g., byte) in the input stream will
`be read and temporarily stored in C (step 203). Then, the
`next consecutive characters in the input stream are checked
`(step 204) to determine if there is a string of at least s
`consecutive characters that match the character stored in C
`
`to trigger a run-length sequence (step 205), where s is a
`predetermined minimum number of consecutive characters
`that are required to trigger a run-length encoding sequence.
`If there are at least s consecutively similar characters in
`the input stream (affirmative determination in step 205), then
`a determination is made as to whether Pstring is empty (step
`206). If Pstring is empty (affirmative determination in step
`206),

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket