throbber
Ulllted States Patent [19]
`Franaszek et al.
`
`USOOS 87003 6A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,870,036
`*Feb. 9, 1999
`
`[54] ADAPTIVE MULTIPLE DICTIONARY DATA
`COMPRESSION
`
`341/51
`2/1995 Serussi et a1. .
`5,389,922
`5,467,087 11/1995 Chu ......................................... .. 341/51
`
`[75] Inventors: Peter Anthony Franaszek, Mt. Kisco;
`John Timothy Robinson, Y0rkt0Wn_
`gliliitsltsgl‘fglfl glgyslus Thomas’ Whlte
`
`’
`'
`'
`[73] Assignee: International Business Machines
`Corporation, Armonk, NY.
`
`[ * ]
`
`Notice:
`
`The terminal 17 months of this patent has
`been disclaimed-
`
`[21] Appl' NO‘: 393’967
`[22] Filed:
`Feb. 24, 1995
`H03M 7/00
`[51] Int C16
`_______ __ 341/51
`[52]
`[58] Field of Search ................................ .. 341/51, 65, 59,
`341/106, 107
`
`[56]
`
`_
`References Clted
`U.S. PATENT DOCUMENTS
`
`Primary Examiner—Brian K. Young
`Attorney, Agent, or Firm—Richard M. LudWin; Heslin &
`
`Rothenberg, PC.
`
`[57]
`
`ABSTRACT
`
`A system and method for compressing and decompressing
`data using a plurality of data compression mechanisms.
`Representative samples of each block of data are tested to
`select an appropriate one of the data compression mecha
`nisms to apply to the block. The block is then compressed
`using the selected one of the mechanisms and the com
`pressed block is provided With an identi?er of the selected
`mechanism For decompression, the identi?er is examined to
`select an appropriate one of the data decompression mecha
`nisms to apply to the block. The block is then decompressed
`using the selected one of the mechanisms.
`
`5,204,756
`
`4/1993 Chevion et al. .
`
`17 Claims, 9 Drawing Sheets
`
`205
`\ DATA BLETCKS
`TYPE
`TYPE
`
`EEO
`\
`
`CUMPRESSED
`BLEICKS
`835V CMD
`CMD
`
`0 0 0
`
`—>
`
`CUMPRESSEIR
`
`—>
`
`O I O
`
`\F
`210
`TYPEI DATA TYPE
`(TEXT, IMAGE, ETC.)
`(UPTIETNAL)
`
`CUMPRESSIEIN
`METHUD TABLE
`
`O 0 o
`
`RUN-LENGTH
`240/
`
`L21
`
`O O .
`
`COMPRESSED
`DATA BLEICKS
`
`840 /
`Q
`
`CMD
`
`CMD J 270x ,
`
`' ' ‘
`
`—’
`
`DATA
`DE-CIJMPRESSEIR
`
`—>
`
`v
`230
`CMD= CDMPRESSIUN
`METHEID DESCRIPTIUN
`
`DICTIUNARY
`BLETCKS
`850
`
`UNCUMPRESSED
`BLIIICKS
`
`' ' '
`
`280/
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 001
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 1 of9
`
`5,870,036
`
`
`
`mml/ wow J
`
`X ow.
`
`mmwwmmazmolwm
`
`mmmwmmninu
`
`mmwwmmmzmu
`Qmwwmm
`
`SEQ
`
`x0515
`
`Kg
`
`InEUUZD
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 002
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 2 of 9
`
`5,870,036
`
`m¥ua4m
`
`omm
`
`>m¢ZDHHo2Hk
`
`zaHmwmm¢zao,mzo
`
`
`
`zaH+mHmummmQDIFMZ
`
`ommofim
`
`m4m¢Hmmxkmz.
`
`
`
`ZDHMMMMQZDQma>»<H<m_m¢»»
`qupmmU<zH,HxmHv
`
`n4¢ZUHHmDv
`
`mmmmmaazauza
`
`eaamEEm
`
`mmmmmmmzau
`
`
`
`mxum4m¢+<m
`
`mmmmmmazmo
`
`0|.1|TOOOEEmmmEEmxoD4mmxoa4m
`mmmmmmmzaumom
`<+<Q
`
`owmomm
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 003
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 3 of 9
`
`5,870,036
`
`mmmmmmmzao
`
`¢»¢n
`
`vafim\omm
`
`mmm+_
`
`HDZmmv
`
`Ammmmmzmo
`
`wmmmmzao
`
`hummzq
`
`Q20
`
`Hum4mm
`
`Fmum
`
`QDIHMZ
`
`OHQHQJ
`
`Hxmp
`
`:u¢m
`
`naxpmz
`
`GHUHQQ
`
`mzHzmmFmm
`
`J20
`
`mmmwmmmzaoza
`
`xua4mqpqm
`
`
`
`mammmmmzau<H<Q
`
`ofim
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 004
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 4 of9
`
`5,870,036
`
`FIG. 4A
`
`DATA
`TYPE AVAILABLE
`'2
`
`SET CML=
`SET CML=
`404\ CML-TABLE
`409
`DEFAULT-CML
`<TYPE<B>>
`\
`/
`407 ____sET E Tl] ETRsT____—|
`CML ENTRY
`
`METHUD (E)
`DICTIEINARY
`BASED '?
`
`414
`
`DATA TYPE
`
`419\DESFEATULDT_—LDI—SEI=ST
`T
`SET M: METHDD<E>J
`REMUVEi E FRDM CML"—
`T
`EUR EACH D
`424
`\__ IN D—LIST:
`
`421
`
`'
`
`SET IHJST:
`D-LIST-TABLE
`<TYPE<B>>
`
`ADD (MD)
`TU CML
`
`427j
`
`WAS E
`LAST ENTRY IN
`CML '?
`
`YES
`
`TEST
`
`SET E Tl]
`NEXT ENTRY 431
`IN CML
`-/
`l
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 005
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 5 of9
`
`5,870,036
`
`FIG. 48
`
`T3“
`
`434
`SET E TU FIRST
`CML. ENTRY,- 1:1 _/
`
`437
`
`NE]
`
`METHUD (E)
`DICTIONARY
`BASED '2
`
`YES
`
`439
`
`/
`
`SET M=METHE|D (E);
`D=DICTIE1NARY (E)
`
`444
`/
`CUMPRESS B001, I02)
`USING METHEID (E)
`INTEI K BYTES
`
`CUMF’RESS B(k31, be)
`USING METHUD M &
`DICTIUNARY D
`INTI] K BYTES
`
`K 441
`
`447 \
`
`SET
`CRTT<I>=K
`
`NE]
`
`451 \
`SET E TEI
`NEXT ENTRY IN
`CML,‘ I=I+1
`
`WAS E
`LAST ENTRY IN
`CML '?
`
`YES
`
`FIND Q SUCH
`THAT CRTT<Q>
`IS MIN(CRTT)
`
`454
`
`457
`
`IS
`CRTT<Q> <
`THRESHULD
`7
`
`YES
`
`CEIMPRESS
`
`459
`DEI NUT
`\
`CEIMPRESS B
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 006
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 6 of9
`
`5,870,036
`
`CEIMPRESS
`
`461
`SET E TU Q’TH
`ENTRY IN CML _/
`
`NI]
`
`470
`\
`CUMPRESS B INTII B’
`USING METHUD <E>
`
`METHUD (E)
`DICTIEINARY
`BASED 'e
`
`464
`
`YES
`
`467
`f
`CDMPRESS B INTU B’
`USING METHIJD <E> &
`DICTIUNARY (E)
`
`_1l___
`
`STURE E IN [IMD
`AREA IIIF B’ /475
`
`I
`
`OUTPUT B’
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 007
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 7 0f 9
`
`5,870,036
`
`0mm
`
`m .9;
`
`
`
` mmwwmmazmulmm qkqm K ohm
`
`
`
`\lomm WQHW
`
`mmwwmmmzmo / 0mm
`
`.lllvGmIm 3km
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 008
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 8 of9
`
`5,870,036
`
`210\/UNCDMPRES/\SED BLEICKS
`T
`STlIIRE (220
`
`P
`RETRIIEVE/270
`
`‘
`DATA
`CUMPRESSUR =
`
`"
`
`DATA
`DEI-CIJMPRESSDR
`
`COMPRESSION
`METHEID TABLE 111cm
`
`y
`
`r
`
`ARITHMETIC
`
`N
`
`/.
`600
`/- RUN-LENGTH N
`
`BLEICK ID
`__
`M’ZEDR'
`
`mm 601
`~I
`/
`DIRECTORY 60g STATIC DI
`
`>
`
`@T.
`
`Y
`
`230
`W‘
`
`COMPRESSED
`BLOCKS
`
`\
`
`,
`
`j
`630
`620
`/
`MEMEIRY
`DICTIONARY
`BLEICKS
`
`640
`
`DICTF-‘lS‘b
`
`Y
`
`603/ SUB-BLOCK
`0 D1cT.=cATENATE Y
`6 14/ SUB-BLOCKS
`/‘
`240
`
`/
`240a
`
`2250
`,1
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 009
`
`

`
`U.S. Patent
`
`Feb. 9, 1999
`
`Sheet 9 of 9
`
`5,870,036
`
`:+wzm4-z:m
`
`uH+m:IHHm<
`
`uH+mz:»~m¢
`
`zaHmmmm¢:ao
`
`:+uzm4:z:m
`
`zaHmmmm¢:mo
`
`Qmama
`
`>m¢ZDHHuHQ
`
`zmHwmmmmzmo
`
`Qmemo
`
`>m<ZDHHoHm
`
`mmmmmmazau
`
`ymazmz
`
`m¥ua4m
`
`mxoa4m
`
`
`38%zamzq?_mmm8¢222$._emu
`IIIEEG:4
`
`
`
`m>mHm+mmmmmkm
`
`mom
`
`mmmwmmmzaoz:
`
`
`
`m¥om4mqpqm
`
`N.O_n_
`
`_>m¢ZDHHo:HDH¥om4m__Jmmkzmo
`
`mmzm:mmmuuqzmmzqm
`
`
`aqz_mmmmmmq
`11>mazm24
`_>4zm-m¢mm_
`IK%\ofim
`
`mama__MP3
`
`omm1;VI
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 010
`
`

`
`1
`ADAPTIVE MULTIPLE DICTIONARY DATA
`COMPRESSION
`
`I. BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`This invention relates in general to data processing sys
`tems and, more speci?cally, to a system and method for
`compressing data.
`2. Background of the Invention
`In general, data compression involves taking a stream of
`symbols and transforming them into codes. If the compres
`sion is effective, the resulting stream of codes Will be smaller
`than the original symbol stream. The decision to output a
`certain code for a certain symbol or set of symbols is based
`on a model. The model is simply a collection of data and
`rules used to process input symbols and determine Which
`code(s) to output. A computer program may use the model
`to accurately de?ne the probabilities for each symbol in
`order to produce an appropriate code based on those prob
`abilities.
`Data compression techniques often need to be lossless
`(Without incidences of error or loss of data). Exceptions to
`this include, for example, certain applications pertaining to
`graphic images or digitiZed voice. Lossless compression
`consists of those techniques guaranteed to generate an exact
`duplicate of the input data stream after a compress/expand
`cycle. This is the type of compression often used When
`storing database records, spreadsheets, or Word processing
`?les. In these applications, the loss of even a single bit can
`be catastrophic.
`Lossless data compression is generally implemented
`using one of tWo different types of modeling: statistical or
`dictionary-based. Statistical modeling reads in and encodes
`a single symbol at a time using the probability of that
`character’s appearance. Statistical models achieve compres
`sion by encoding symbols into bit strings that use feWer bits
`than the original symbols. The quality of the compression
`goes up or doWn depending on hoW good the program is at
`developing a model. The model has to predict the correct
`probabilities for the symbols. The farther these probabilities
`are from a uniform distribution, the more compression that
`can be achieved.
`In dictionary-based modeling, the coding problem is
`reduced in signi?cance, making the model supremely impor
`tant. The dictionary-based compression processes use a
`completely different method to compress data. This family
`of processes does not encode single symbols as variable
`length bit strings; it encodes variable-length strings of
`symbols as single pointers. The pointers form an index to a
`phrase dictionary. If the pointers are smaller than the phrases
`they replace, compression occurs. In many respects,
`dictionary-based compression is easier for people to under
`stand. In every day life, people use phone numbers, DeWey
`Decimal numbers, and postal codes to encode larger strings
`of text. This is essentially What a dictionary-based encoder
`does.
`In general, dictionary-based compression replaces
`phrases With pointers. If the number of bits in the pointer is
`less than the number of bits in the phrase, compression Will
`occur. HoWever, the methods for building and maintaining a
`dictionary are varied.
`A static dictionary is built up before compression occurs,
`and it does not change While the data is being compressed.
`For example, a database containing all motor-vehicle reg
`istrations for a state could use a static dictionary With only
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`5,870,036
`
`2
`a feW thousand entries that concentrate on Words such as
`“Ford,” “Jones,” and “1994.” Once this dictionary is
`compiled, it is used by both the encoder and the decoder as
`required.
`There are advantages and disadvantages to static dictio
`naries. Nevertheless, dictionary-based compression schemes
`using static dictionaries are mostly ad hoc, implementation
`dependent, and not general purpose.
`Many of the Well-knoWn dictionary-based processes are
`adaptive. Instead of having a completely de?ned dictionary
`When compression begins, adaptive schemes start out either
`With no dictionary or With a default baseline dictionary. As
`compression proceeds, the processes add neW phrases to be
`used later as encoded tokens.
`For a further discussion of data compression in general,
`please refer to The Data Compression Book, by Mark
`Nelson, @ 1992 by M&T Publishing, Inc., Which is hereby
`incorporated by reference herein.
`As mentioned, the history of past symbols of a sequence
`often provides valuable information about the behavior of
`the sequence in the future. Various universal techniques have
`been devised to use this information for data compression or
`prediction. For example, the Lempel-Ziv (“LZ”) compres
`sion process, Which is discussed Within Compression of
`Individual Sequences by Variable Rate Coding, by J. Ziv and
`A. Lempel, IEEE Trans. Inform. Theory, IT-24: 530—536,
`1978 (Which is incorporated by reference herein), uses the
`past symbols to build up a dictionary of phrases and com
`presses the string using this dictionary. As Lempel and Ziv
`have shoWn, this process is universally optimal in that the
`compression ratio converges to the entropy for all stationary
`ergodic (of or related to a process in Which every sequence
`or siZeable sample is equally representative of the Whole)
`sequences. Thus, given an arbitrarily long sequence, such
`compression operates as Well as if the distribution of the
`sequence Was knoWn in advance.
`The Lempel-Ziv compression method has achieved great
`popularity because of its simplicity and ease of implemen
`tation (actually, Lempel-Ziv is often used to denote any
`dictionary based universal coding scheme, as a result, the
`standard method described herein is only one of this large
`class). It asymptotically achieves the entropy limit for data
`compression. HoWever, the rate of convergence may be sloW
`and there is scope for improvement for short sequences. In
`particular, at the end of each phrase, the process returns to
`the root of the phrase tree, so that contextual information is
`lost. One approach to this problem Was suggested by
`Plotnik, Weinberger and Ziv for ?nite state sources, as
`described Within Upper Bounds on the Probability of
`Sequences Emitted by Finite-State Sources and on the
`Redundancy of the Lempel-ZivAlgorithm, by E. Plotnik, M.
`J. Weinberger and J. Ziv, IEEE Trans. Inform. Theory,
`IT-38(1): 66—72, January 1992, Which is incorporated by
`reference herein. Their idea Was to maintain separate LZ like
`trees for each source state (or estimated state) of a ?nite state
`model of the source. Plotnik, Weinberger and Ziv shoWed
`that this procedure is asymptotically optimal.
`U.S. patent application Ser. No. 08/253,047, ?led on Jun.
`2, 1994 and assigned to the same assignee as the present
`invention describes a compression system Wherein contex
`tual information is implemented in conjunction With a
`dictionary-based compression process Within a data process
`ing system. Before encoding a next phrase Within an input
`string of data, the compression system, through dynamic
`programming, derives the statistically best set of dictionaries
`for utiliZation in encoding the next phrase. The selection of
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 011
`
`

`
`5,870,036
`
`3
`this set of dictionaries is dependent upon each dictionary’s
`past utilization Within the process. Thus, the choice of
`dictionaries is directly related to each of their performance
`on compression, and not on a presumed model for the
`sequence as is required in a compression technique utilizing
`a state machine description of the source, Which is available
`to both the encoder and decoder. The selected set of dictio
`naries is implemented by the data processing system to
`encode the next phrase of data. The context of the phrase is
`used to select a particular dictionary Within the selected set
`for the encoding process, Which computes a pointer to the
`phrase Within the dictionary that corresponds to the next
`phrase.
`The above-technique has the property that the compressor
`and de-compressor can each compute from the preceding
`string, the dictionary Which Will be used for the next phrase.
`This has the advantage that the identity of the dictionary
`need not be included. HoWever, the chosen dictionary may
`not in fact be the best one to use, as the decision of Which
`to use is based on estimates. Further, in some cases
`improved compression could result from using other types
`of compression techniques.
`
`II. SUMMARY OF THE INVENTION
`
`It is an object of this invention to dynamically compress
`data blocks by using the best of a given set of methods, and
`for dictionary-based compression, by using the best of a
`given set of dictionaries.
`In accordance With one aspect of the present invention,
`there is provided a system and method for compressing data
`using a plurality of data compression mechanisms. Repre
`sentative samples of each block of data are tested to select
`an appropriate one of the data compression mechanisms to
`apply to the block. The block is then compressed using the
`selected one of the mechanisms and the compressed block is
`provided With an identi?er of the selected mechanism.
`In accordance With another aspect of the present invention
`there is provided a system and method for decompressing
`data. using a plurality of data decompression mechanisms.
`Each block of data includes a coding identi?er Which is
`indicative of the method or mechanism used to compress the
`block. The coding identi?er is examined to select an appro
`priate one of the data decompression mechanisms to apply
`to the block. The block is then decompressed using the
`selected one of the mechanisms.
`
`III. BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 shoWs a system structure suitable for use in
`conjunction With the present invention;
`FIG. 2 shoWs a data compressor and data de-compressor
`270 in accordance With the principles of the present inven
`tion;
`FIG. 3 shoWs the overall structure of the data compressor
`of FIG. 2;
`FIGS. 4A, 4B and 4C are a How chart of the operation of
`the data compressor of FIG. 3;
`FIG. 5 shoWs the overall structure of the data decompres
`sor of FIG. 2;
`FIG. 6 shoWs a storage system incorporating compression
`& decompression in accordance With the principles of the
`present invention; and
`FIG. 7 is a block diagram of an encoder/decoder in a
`compression system, in accordance With the principles of the
`present invention.
`Like reference numerals appearing in multiple ?gures
`represent like elements.
`
`4
`IV. DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`Asystem structure suitable for use in conjunction With the
`present invention is shoWn in FIG. 1. The system includes a
`CPU 5 Which accesses a ?rst memory 10 containing uncom
`pressed data blocks 15. Within the same information pro
`cessing system as the CPU 5 or Within another “remote”
`system, there is a second memory 20. The memories 10, 20
`can be semiconductor memories, magnetic storage (such as
`a disk or tape), optical storage or any other suitable type of
`information storage media. The data blocks are transferred
`betWeen the ?rst and second memories.
`To increase the number of data blocks that can be stored
`in the second memory 20 given a ?xed second memory siZe,
`data blocks 25 may be stored in a compressed format in the
`second memory. For this purpose there is a compressor 30
`that compresses data blocks as they are transferred to the
`second memory, and a de-compressor 40 that de-compresses
`data blocks as they are transferred to the ?rst memory.
`In the present description, the Word “coding” is used to
`refer generically to any of encoding (compressing) or decod
`ing (decompressing). The Word “CODEC” refers to an
`apparatus that performs both encoding and decoding.
`FIG. 2 shoWs a data compressor 220 and data
`de-compressor 270 Which operate in accordance With the
`principles of the present invention. The data compressor 220
`and de-compressor 270 and the associated logic take the
`places of, respectively, the compressor 30 and decompressor
`40 of FIG. 1. In this embodiment, the uncompressed data
`blocks 210 that can optionally contain type information 205.
`The type information can be, for examples, image data
`encoded in a given format, source code for a given pro
`gramming language, etc. The data blocks 210 are input to the
`data compressor 220.
`The data compressor 220 and data de-compressor 270
`share a compression method table 240 and a memory 250
`containing a number of dictionary blocks 1,2,n. It should be
`understood that the table 240 and the memory 250 can each
`be a single shared entity or, alternatively, can be duplicated
`such that one of each is located locally at the data compres
`sor 220 and the data decompressor 270. The compression
`method table 240 can include, for example, of an array of
`pointers to compression routines or device address of
`compression/decompression hardWare components. Each
`method entry in the table 240 includes an identi?er 240a
`indicated Whether the method is non-dictionary-based or
`dictionary-based.
`Dictionary blocks are identi?ed by an index specifying
`the offset of the block in the dictionary block memory. As
`explained more fully beloW, the compressor 220 selects a
`compression method. For dictionary-based methods the
`compressor 220 also selects a dictionary block, to compress
`the data. The compressor outputs compressed data blocks
`230, With an index (M) 232 identifying the selected com
`pression method, and for dictionary-based methods, dictio
`nary block identi?er (D), encoded in a compression method
`description (CMD) area 235 in the compressed block.
`The use of a dictionary for data compression and decom
`pression is Well knoWn in the art and described, for example,
`in US. patent application Ser. No. 08/253,047, ?led on Jun.
`2, 1994 and in US. Pat. No. 4,814,746, both of Which are
`incorporated by reference herein as if printed in full beloW.
`Compressed data blocks 230, With the compression
`method identi?er M and for dictionary-based methods dic
`tionary block identi?er D encoded in the CMD area 235 are
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 012
`
`

`
`5,870,036
`
`5
`input to the de-compressor 270. The de-compressor 270
`de-compresses the block using the speci?ed method found in
`the compression method table 240 (using the compression
`method identi?er as an index), and for dictionary-based
`methods, speci?ed dictionary block found in the dictionary
`block memory 250, and outputs uncompressed data blocks
`280.
`FIG. 3 shoWs the overall structure of the data compressor
`220. Each component is explained more fully beloW. Given
`an uncompressed block 205 as input, ?rst, in block 310, a
`compression method list (CML) is determined. The CML is
`an array of entries of the form M or (M,D), Where the latter
`form is used for dictionary-based methods. M is an index
`into the compression method table 240 previously described
`With respect to FIG. 2, and D is a dictionary block identi?er
`for a dictionary block contained in the dictionary block
`memory 250, also previously described With respect to FIG.
`2.
`
`Next, in block 320, each method or (method,dictionary)
`pair is tested on a sample taken from the uncompressed data
`block 205, and the resulting sample compression is saved. In
`block 330, the best method or (method,dictionary) pair is
`found (i.e., the one giving the most compression). The best
`method is readily determined by comparing the siZe of the
`uncompressed same to the siZe of the compressed sample for
`each method. In a preferred embodiment, the siZe of the
`sample block as compressed by each of a number of methods
`is stored in a table and the best method is determined by
`examining the table to ?nd the smallest compressed block.
`If the best sample compression does not satisfy a threshold
`condition (eg 30% compression as compared to the uncom
`pressed sample), then it is decided at this point not to
`compress the block as indicated by 335. In this case the
`block Will be stored in uncompressed format. OtherWise, in
`block 340, the block is compressed using the best method
`(and if applicable dictionary) found in block 330, the method
`or (method,dictionary) pair is encoded in the CMD area 235
`of the block, and the compressed data block 230 results as
`output.
`For dictionary compression methods, one Way to test a
`representative sample is to determine the compression ratios
`of a given number (eg 30) phrases. It should be understood
`that all of the phrases could be tested. For symbol based
`compression (such as Huffman or Arithmetic coding) the
`compression of a given number (eg 150) or percentage (eg
`10% to 20%) of the symbols can be tested.
`The operation of the data compressor 220 is shoWn in
`FIGS. 4A, 4B, and 4C.
`In step 401, if a data type (eg text, image, etc.) for a
`given uncompressed block B is available, in step 404 the
`Compression Method List (CML) is set to a list of com
`pression methods that have been preselected for that data
`type. OtherWise, if no data type is available, in step 407 the
`CML is set to a default list of compression methods.
`Next, the folloWing steps are done for each dictionary
`based method M in the CML. This involves checking each
`entry E in the CML. First, in step 409, E is set to the ?rst
`CML entry. Next, in step 411, it is determined if the method
`of E is dictionary-based. If it is not, step 429 is executed,
`Where it is checked if E Was the last CML entry. If it Was not,
`in step 431 E is set to the next CML entry, and the method
`returns to step 411. If step 429 determines that E Was the last
`CML entry, the loop is exited, and the method proceeds to
`step 434.
`If the test in step 411 determines that the method of E Was
`dictionary-based, the folloWing steps are performed, With
`control passing to step 429 at the completion of these steps.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`6
`In step 414, it is determined if a data type is available (ie
`the block includes a “data type” entry in the type ?eld 205),
`If a data type is available, in steps 417, 421, 424, and 427,
`the CML is expanded by replacing E With the list (M,D1),
`(M,D2), .
`.
`.
`, (M,Dj), Where (D1, .
`.
`.
`, Dj) is a list of
`dictionary block identi?ers that have been preselected for
`the data type When using compression method M.
`OtherWise, if no data type is available), steps 419, 421, 424,
`and 427, replace E With the list (M,D1‘), (M,D2‘), .
`.
`.
`,
`(M,Dk‘), Where (D1‘, .
`.
`. , Dk‘) is a default list of dictionary
`block identi?ers for compression method M.
`Next, steps 437, 439, 441, 444, and 447, are done for each
`entry E of the form M or (MD) in the CML (Where the latter
`entry is for dictionary-based methods).
`In step 434, E is set to the ?rst entry in the CML, and I,
`Which Will be used as the entry number, is set to 1. At the
`completion of steps 437—447, in step 449 it is checked if E
`Was the last CML entry. If it Was not, E is set to the next
`CML entry and I is incremented in step 451, With control
`returning to block 437. OtherWise the loop is exited, With
`control passing to step 454.
`Steps 437—447 comprise the body of the loop. Steps 437,
`439, 441, and 444, apply compression method M (and
`dictionary D if M is dictionary-based) on a sample from B,
`B(b1,b2) consisting of bytes b1 through b2 consecutively in
`B, resulting in a compressed sample of length K bytes. Next,
`in step 447, the resulting compressed sample length K is
`saved as entry CRTT(I) in compression ratio test table
`CRTT, Where this table is an array of integers. Note: the
`above compression tests are independent, and although the
`?gure shoWs a sequential loop, implementations are possible
`in Which all these tests are performed in parallel.
`Next, in step 454, the smallest compressed length in the
`table CRTT is found, say entry CRTT(Q). In step 457, the
`smallest compressed length is compared against a threshold.
`If all uncompressed input blocks are a given ?xed siZe, the
`threshold can be constant; otherWise, the threshold can be
`computed as a function of the input block siZe. For example,
`the threshold can be computed as a given fraction of the siZe
`of the block currently being compressed.
`If step 457 determines that CRTT(Q) is not suf?ciently
`small, in step 459 the data block B is not compressed.
`OtherWise, if step 457 determines that CRTT(Q) is suf?
`ciently small, in step 461, E is set equal to the Q’th entry in
`the CML, Where E is of the form M or (M,D). In steps 464,
`467, and 470, B is compressed using M, and if M is
`dictionary-based, dictionary D, into a block B‘. Next, in step
`457, E is recorded in the CMD (compression method
`description) pre?x area of the compressed block (B‘), and the
`resulting block B‘ is output.
`FIG. 5 shoWs the structure of the decompressor 270.
`Given a compressed block 230 as input, ?rst, in block 510
`the compression method used to compress the block is found
`(by decoding the CMD ?eld), along With the dictionary
`identi?er of the dictionary used if the method is dictionary
`based. Next, in block 520, the compression method and
`dictionary (if applicable) is used to decompress the block,
`resulting in the uncompressed data block 280 as output.
`FIG. 6 illustrates an application of the principles of the
`present invention to a storage system in Which there is a
`random access memory 620 into Which arbitrary blocks of
`data are stored and retrieved. In order to increase the total
`number of bytes of uncompressed data blocks that can be
`stored before the memory becomes full, uncompressed
`blocks 210 are compressed by a data compressor 220 before
`being stored as compressed blocks 230, and de-compressed
`by a data de-compressor 270 When they are retrieved.
`
`Veritas Techs. LLC
`Exhibit 1004
`Page 013
`
`

`
`5,870,036
`
`7
`In this example, part of the memory 620 is used to store
`dictionary blocks 250. The dictionary blocks include some
`number of ?xed, static blocks, determined before hand, and
`also of some number of blocks that are created and subse
`quently deleted in a fashion subsequently to be described.
`For simplicity, assume that all uncompressed data blocks
`are a given ?xed siZe, say 4096 bytes. Each such block can
`be considered to consist of 8 512-byte sub-blocks, for
`example. In some cases it may happen that using the ?rst
`such sub-block as a dictionary may yield better compression
`than any of the ?xed static dictionaries. This method is
`shoWn as 602 in the compression method table 240.
`Alternatively, take the ?rst 64 bytes of each sub-block, and
`concatenating these into a 512-byte dictionary, could be the
`best dictionary for some blocks having certain phrase pat
`terns. This method is indicated by 603 in the ?gure. The
`other three methods indicated in the ?gure are arithmetic
`coding 600, run-length coding 601, and LZ1 using one of the
`?xed set of static dictionaries 602.
`The data compressor operates as previously described.
`HoWever, if the best compression method is found to be that
`using the ?rst sub-block 602 or that using a concatenation of
`multiple sub-blocks 603, the dictionary is stored as one of
`the dictionary blocks 250 at this time, the dictionary direc
`tory 640 is updated accordingly using standard techniques,
`and the identity of the dictionary block as given by the
`dictionary directory is stored in the CMD. In any case, the
`compressed data block 230 is stored in the memory 620, and
`the mapping of block identi?ers to memory addresses 630 is
`updated using standard techniques. Conversely, the data
`de-compressor operates as previously described, hoWever
`When a compressed block is retrieved, if it is found that
`either the ?rst sub-block or concatenation of multiple sub
`blocks methods Was used, then if the compressed data block
`is also being removed from the memory, the dictionary is
`also removed from the set of dictionary blocks at this time.
`FIG. 7 is a block diagram of an encoder/decoder in a
`compression system, in accordance With the principles of the
`present invention. The system of FIG. 7 includes a micro
`processor (Up) 708 Which controls the encoding and decod
`ing of the data blocks under control of program code 712
`stored in a read only memory 710. The program code
`implements the methods of FIGS. 4A—4C and the decom
`pression of FIG. 5. The system also includes CODECs 702,
`704, 706 Which perform both the encoding and decoding
`functions as determined by control data sent from the
`microprocessor 708. The CODECs can be embodied in
`hardWare logic circuits or as program code executed by
`microprocessor 708. The compression method table 240 is
`stored in a read only memory 714 Which is addressed and
`read by the microprocessor. The block address ID map 630,
`the dictionary directory 640, the dictionary blocks 250 and
`the compressed blocks 230 are stored in random access
`memories 716, 718, 720 (Which can be multiple memories or
`address spaces in a single memory).
`The present system and method can be modi?ed and
`extended in a number of Ways. For example multiple dic
`tionary blocks can be used as a logical single dictionary. In
`such an embodiment, a set of dictionary blocks, say D1, D2,
`.
`.
`.
`, Dn, can be used by a dictionary-based compression
`method, say IJempel-Ziv 1 (LZ1), as if the ordered set is
`logically a single dictionary. In this case D1, D2, .
`.
`.
`, Dn
`are considered to be catenated into a single logical dictio
`nary D, and pointers in the compressed output refer to
`substrings in one of the n actual dictionary blocks.
`As another modi?cation, the set of dictionary blocks used
`by the compressor and de-compressor need not be ?xed to
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`50
`
`a O
`
`65
`
`8
`a predetermined set. For example, this set could be initially
`empty, and dictionaries generated by a dictionary-based
`compression method could be added to the set as subsequent
`blocks are compressed, until a maximum number of dictio
`nary blocks is reached.
`Furthermore, if there are currently no compressed blocks
`requiring a given dictionary block for de-compression (this
`can be determined by examining the CMD ?elds), the given
`dictionary block can be removed from the current set of
`dictionary blocks.
`Among other possibilities, the set of dictionary blocks
`could be managed using an LRU type policy: in order to add
`a neW dictionary block, the least-recently-used dictionary
`block (not currently required for de-compression) could be
`selected to be replaced.
`Although the present invention and its advantages have
`been described in detail, it should be understood that various
`changes, substitutions and alterations can be made herein
`Without departing from the spirit and scope of the i

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket