`(10) Patent No.:
`US 6,624,761 132
`
`Fallon
`(45) Date of Patent:
`*Sep. 23, 2003
`
`USOO6624761B2
`
`(54) CONTENT INDEPENDENT DATA
`COMPRESSION METHOD AND SYSTEM
`
`(75)
`
`Inventor:
`~
`.
`(73) Assignee.
`.
`( 4 ) Notice:
`
`James J. Fallon, Armonk, NY (US)
`-
`liegltlme Data, LLC, New York, NY
`(
`)
`.
`.
`.
`.
`Subject. to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U~S~C~ 154(b) by 0 days
`
`This patent is subject to a terminal dis-
`claimer.
`
`(21) APP1~ N05 10/016s355
`~
`.
`(22) Filed.
`(65)
`
`Oct. 29, 2001
`Prior Publication Data
`
`US 2002/0097172 A1 Jul. 25, 2002
`
`Related US. Application Data
`
`(63)
`
`Continuation—in—part of application No. 09/705,446, filed on
`NOV. 3, 2000, now Pat. No. 6,309,424, which is a continu—
`ation of application No. 09/210,491, filed on Dec. 11, 1998,
`now Pat. No. 6,195,024.
`
`(51)
`
`Int. Cl.7 ......................... G06F 13/12; G06F 13/38;
`H03M 7/34; H03M 7/38
`
`............................................ 341/51; 710/68
`(52) US. Cl.
`(58) Field of Search ............................ 341/51, 67, 106,
`341/50; 79; 382/166, 100; 239; 71052
`68; 348/458; 707/101; 235/431; 709/208;
`704/245; 375/240
`
`(56)
`
`References Cited
`
`U’S’ PATENT DOCUMENTS
`4,682,150 A *
`7/1987 Mathes et a1.
`.............. 235/431
`4,872,009 A
`10/1989 Tsukiyama et a1.
`4,929,946 A
`5/1990 O’Brien et a1.
`590459852 A
`9/1991 MitCheH 6t 91-
`5,097,261 A
`3/1992 Langdon, Jr. et a1.
`5,212,742 A
`5/1993 Normile et a1.
`
`5,231,492 A
`5,237,675 A *
`5,243,341 A
`
`7/1993 Dangi et a1.
`................. 710/68
`8/1993 Hannon, Jr.
`9/1993 Seroussi et a1.
`
`57243348 A
`5,270,832 A
`5,379,036 A
`5,381,145 A
`5,394,534 A
`5461679 A
`5,467,087 A
`5,471,206 A
`5,479,587 A
`5,486,826 A
`5,495,244 A
`5,533,051 A
`5,557,749 A *
`5,583,500 A
`5,627,534 A
`5,654,703 A
`5,668,737 A
`5,717,393 A
`5,717,394 A
`5,729,228 A
`5,748,904 A
`5,771,340 A
`5,784,572 A
`5,799,110 A
`5,805,932 A
`
`9/1993 JaCkson .
`12/1993 Balkanskl et a1.
`1/1995 Storer
`“1995 Allen et a1.
`2/1995 Kulakowski et a1.
`101995 N
`'1
`t
`l.
`“$1995 Cfilrlml e e a
`“/1995 Allen et a1.
`12/1995 Campbell et a1.
`1/1996 Remillard
`2/1996 Jeong et a1.
`7/1996 James
`9/1996 Norris ........................ 709/208
`12/1996 Allen et a1.
`5/1997 Craft
`8/1997 Clark, 11
`9/1997 Iler
`2/1998 Nakano et a1.
`2/1998 Schwartz et al.
`3/1998 Franaszek et a1.
`5/1998 Huang et a1.
`6/1998 Nakazato et a1.
`7/1998 Rostoker et a1.
`8/1998 Israelsen et a1.
`9/1998 Kawashima et a1.
`.
`.
`(List continued on next page.)
`Primary Examiner—Patrick Wamsley
`(74) Attorney, Agent, or Firm—F. Chau & Associates, LLP;
`Frank V. DeRosa, Esq.
`
`(57)
`
`ABSTRACT
`
`Systems and methods for providing fast and efficient data
`compression using a combination of content independent
`data compression and content dependent data compression.
`In one aspect, a method for compressing data comprises the
`steps of: analyzing a data block of an input data stream to
`identify a data type of the data block, the input data stream
`comprising a plurality of disparate data types; performing
`content dependent data compression on the data block, if the
`data type of the data block is identified; performing content
`independent data compression on the data block, if the data
`type of the data block is not identified.
`
`22 Claims, 34 Drawing Sheets
`
`Content Dependent Encoders
` E
`
`
`
`Data Black
`Counter
`
`
`
`
`
`n erDmE
`Content
`
`
`Dependent
`input Data
`Butler
`Data
`
`
`Content Independent
`N0
`Encoders
`Recognition
`
`
`
`Data/File
`
`
`Recognition
`
`List(e) or
`Algorithms)
`
`1310
`
`
`Commvault Ex. 1014
`Commvault v. Realtime
`
`US Patent No. 9,054,728
`
`Page 1
`
`Page 1
`
`Commvault Ex. 1014
`Commvault v. Realtime
`US Patent No. 9,054,728
`
`
`
`US 6,624,761 B2
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5,809,176 A
`5,818,368 A
`5,818,530 A
`5,819,215 A
`5,825,424 A
`5,847,762 A
`5,861,824 A
`
`9/1998
`10/1998
`10/1998
`10/1998
`10/1998
`12/1998
`1/1999
`
`Yajima
`Langley
`Canfield et al.
`Dobson et al.
`Canfield et al.
`Canfield et al.
`Ryu et al.
`
`6/1999
`5,917,438 A
`10/1999
`5,964,842 A
`11/1999
`5,991,515 A
`2/2000
`6,031,939 A
`2/2001
`6,195,024 B1 *
`6,309,424 B1 * 10/2001
`6,529,633 B1 *
`3/2003
`
`Ando
`Packard
`Fall et al.
`Gilbert et al.
`Fallon ......................... 341/51
`
`Fallon ................ 341/51
`Easwar et al.
`.............. 382/239
`
`* cited by examiner
`
`Page 2
`
`Page 2
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 1 0f 34
`
`US 6,624,761 B2
`
`INPUT DATA STREAM
`
`IDENTIFY INPUT DATA TYPE AND
`GENERATE DATA TYPE IDENTIFICATION
`
`2
`
`DATA TYPE
`
`ID SIGNAL
`
`SIGNAL
`
`
`“—n—“w—namwmucl
`WITH IDENTIFIED DATA TYPE
`
`COMPRESS DATA IN ACCORDANCE WITH
`
`3
`
`IDENTIFIED DATA TYPE
`
`COMPRESSED DATA STREAM
`
`RETRIEVE DATA TYPE
`
`INFORMAgLOggiggm/IPRESSED
`
`DECOMPRESS DATA IN ACCORDANCE
`
`1...
`
`I I :
`
`I I I I I
`
`FIG. 1
`
`PRIOR ART
`
`Page 3
`
`Page 3
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 2 0f 34
`
`US 6,624,761 132
`
`SkQQNQOUZW
`
`\\S3<mthw
`
`mosl‘mome
`
`ZOEmmMmEOO
`
`mat
`
`20:.anme
`
`ow
`
`ZOmE<m§OU
`
`20.223350mmmkzsooI«WEB
`zoammEESSm;
`
`9.5%Ewmuammm«mucozm<36
`
`<._.<D
`
`XOOJm
`
`3.3.2300
`
`om
`
`9
`
`EmpiDm
`
`w«9.2300
`
`Emu—"Sm
`
`Nme2300
`
`Fmmwnouzw
`
`NmmmDOuzm
`
`Ec‘mtkmGR«6
`
`Rmmmzm
`
`
`
`:MMFZDOOcmmeOozm
`
`N.GE
`
`ov.on
`
`Page 4
`
`Page 4
`
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 3 0f 34
`
`US 6,624,761 B2
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`
`INPUT DATA STREAM
`
`COUNT SIZE OF
`
`DATA BLOCK
`
`300
`
`302
`
`B
`
`BUFFER DATA BLOCK
`
`COMPRESS DATA
`BLOCK WITH
`, ENABLED ENCODERS
`
`ENCODER
`
`' BUFFER ENCODED
`DATA BLOCK OUTPUT
`FROM EACH
`
`-
`
`COUNT SIZE OF
`ENCODED DATA
`
`BLOCKS
`
`CALCULATE
`COMPRESS‘ON
`
`RAT‘OS
`
`COMPARE
`COMPRESSKON
`RATlOS WH’H
`
`THRESHOLD LIMIT
`
`310
`
`31 2
`
`314
`
`FIG. 38
`
`Page 5
`
`Page 5
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 4 0f 34
`
`US 6,624,761 B2
`
`A
`
`316
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`THRESHOLD?
`
`
`
`NO
`
`
`
`
`APPEND NULL
`SELECT ENCODED
`
`DESCRIPTOR TO
`DATA BLOCK WITH
`
`
`
`UNENCODED INPUT
`GREATEST
`
`
`
`DATA BLOCK
`COMPRESSION RATIO
`
`
`
`
`
`
`APPEND
`
`CORRESPONDING
`
`DESCRIPTOR
`
`
`
` OUTPUT UNENCODED
`
`OUTPUT ENCODED V
`DATA BLOCK WITH
`DATA BLOCK WITH .
`
`
`NULL DESCRIPTOR
`
`
`_ DESCRIPTOR
`
`318
`
`320
`
`NO
`
`DATA BLOCKS IN INPUT
`STREAM?
`
`
`
`
`
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`
`STREAM
`
`
`330
`
`FIG. 3b
`
`Page 6
`
`YES 332
`
`
`
`TERMINATE DATA
`COMPRESSION
`PROCESS
`
`Page 6
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 5 0f 34
`
`US 6,624,761 B2
`
`8
`
`
`
`20.22.25th$.99:
`
`
`
`”.0mmaommmooozm
`
`
`
`Ems.tjaéawo
`
`
`
`Emu".:m
`
`cme2300,
`
`cmmmnoozm
`
`2.on
`
`(.55QWQOOZM
`
`
`
`\\$3<mmhwEmma—3m
`
`«86:5meNmmhzaoo
`
`Nmmun—002m
`
`
`
`zo.wmmm%oozo_mmmm¢n_mzoo.6.2.
`20.55mem55:00I5“.qu
`2922.2ngmg0:5&3mmmwaoozmmug
`203,328.
`
`<k<o
`
`xoogm
`
`athDOO
`
`0N
`
`Rmumbm
`
`rmthDOo
`
`
`
`Pm.mmaoozm
`
`2«mm.»m.E.(Q
`
`Page 7
`
`Page 7
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 6 0f 34
`
`US 6,624,761 B2
`
`RECEWE mmAL
`DATA BLOCK FROM
`lNPUT DATA STREAM
`
`B
`
`COUNT 3525 OF
`
`DATA BLOCK
`
`BUFFER DATA BLOCK
`
`50°
`
`502
`
`504
`
`COMPRESS DATA
`BLOCK WlTH
`ENABLED ENCODERS
`
`506
`
`APPEND CORRESPONDING
`DESTRABILITY FACTORS T0
`ENCODED DATA BLOCKS
`
`BUFFER .ENCODED DATA
`BLOCK OUTPUT
`FROM EACH ENCODER
`
`508
`
`510
`
`COUNT SiZE OF
`ENCODED DATA
`BLOCKS
`
`CALCULATE
`COMPRESSION
`RATios
`
`512
`
`514
`
`COMPARE COMPRESSION
`RATIOS WiTH THRESHOLD
`LIMIT
`
`516
`
`A
`FIG. 53
`
`Page 8
`
`Page 8
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 7 0f 34
`
`US 6,624,761 B2
`
`518
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`TH RESHOLD?
`
`
`
`520
`
`
`APPEND NULL
`
`
`CALCULATE FIGURE OF
`DESCRIPTOR TO
`MERIT FOR EACH ENCODED
`
`
`
`UNENCODED INPUT
`DATA BLOCK WHICH EXCEED
`
`
`DATA BLOCK
`THRESHOLD
`
`
`
`
`SELECT ENCODED DATA
`
`
`BLOCK WITH GREATEST
`
`FIGURE OF MERIT
`
`
`
`APPEND
`OUTPUT UNENCODEO
`
`
`
`DATA BLOCK WITH
`CORRESPONDING
`
`
`
`NULL DESCRIPTOR
`DESCRIPTOR
`
`
`
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`
` OUTPUT ENCODED
`
`
`
`MORE
`
`DATA BLOCKS IN
`
`INPUT STREAM?
`
`
`
`YES
`
`PROCESS
`
`TERMINATE DATA
`COMPRESSION
`
`
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`534
`
`B
`
`FIG. 5b
`
`Page 9
`
`Page 9
`
`
`
`US. Patent
`
`0
`
`000
`
`SU
`
`426,
`
`2B
`
`
`
`2$51.5meNmmkzaooI3,3.2mm;Etna3$8onM«560300me&Empznoo5meE3
`
`Emu—unm
`
`FmmmDOOZw
`
`4a5538mEmmmsm
`
`w2%282$on.
`
`zoEEUmmom$538Emma«.3280
`
`war.\zoEmflQmmEoEmuuammm$085$3x003
`
`
`
`a.7,w0_n.
`
`68
`
`.mmmb
`
`
`
`ms:QmEGmmm
`
`Page 10
`
`Page 10
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 9 0f 34
`
`US 6,624,761 B2
`
`70°
`
`70
`
`B
`
`INPUT INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`710
`
`COUNT 5125 OF
`DATA BLOCK
`
`NO
`
`NO
`
`704
`
`BUFFER DATA BLOCK
`
`706
`
`INITIALIZE TIMER
`
`708
`
`BEGIN
`COMPRESSING
`DATA BLOCK WITH
`ENCODERS
`
`
`
`
`
`
`ENCODING
`COMPLETE?
`
` STOP
`
`ENCODING
`
`PROCESS
`
`
`718
`
`
`BUFFER
`BUFFER ENCODED
`
`
`DATA BLOCK FOR EACH
`
`$822393?
`ENCODER THAT
`
`
`
`FROM EACH
`COMPLETED ENCODING
`
`
`PROCESS
`
`ENCODER
`WITHIN TIME LIMiT
`
`
`
`
`
`
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`720
`
`
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`
`
`722
`
`
`
`COMPARE COMPRESSION
`
`RATIOS WITH THRESHOLD
`
`
`LIMIT
`
`
`724
`
`FIG. 73
`
`Page11
`
`Page 11
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 10 0f 34
`
`US 6,624,761 B2
`
`726
`
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`THRESHOLD?
`
`
`
`
`
`SELECT ENCODED
`APPEND NULL
`
`
`DATA BLOCK WITH
`DESCRIPTOR TO
`
`
`UNENCODED INPUT
`GREATEST
`
`
`DATA BLOCK
`COMPRESSION RATIO
`
`
`
`
`
`
`
`728
`
`730
`
`
`
`
`
`APPEND
`
`CORRESPONDING
`
`DESCRIPTOR
`
`OUTPUT UNENCODED
`
`DATA BLOCK WITH
`NULL DESCRIPTOR
`
`
`OUTPUT ENCODED
`DATA BLOCK WITH
`
`DESCRIPTOR
`
`
`
`
`
`
`
`MORE
`
`DATA BLOCKS IN INPUT
`
`STREAM?
`
`
`
`
`
`TERMINATE DATA
`COMPRESSION
`
`
`PROCESS
`
`
`
`740
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`
`STREAM
`
`
`FIG. 7b
`
`Page 12
`
`Page 12
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 11 0f 34
`
`US 6,624,761 B2
`
`E.(QQMQOOZN
`
`\3usummkm
`
`mOhliowmn
`
`szwmmmEOo
`
`mat.
`
`ZOFQEOmmo
`
`Emumnmmmmmooozm
`
`
`20.22.2138m«9.238mmmmooozmEowwmuoma
`
`
`zowEEEOo.mmbzaoo
`
`
`0.2mEmumpmxoodm
`
`20.336287<55.562.N$528
`
`cmmwzaoo
`
`Rummamcm«88$
`
`2
`
`Ewmusm
`
`PmMmDOozw
`
`
`
`P5.238$meEE
`
`
`
`
`
`8.8$2:ms:omEomqm
`
`.53
`
`uowas?"—
`
`hams.—
`
`ZOP<Z§EMFMD
`
`on
`
`mm0h0<u
`
`E55588«mucozm
`
`m.07..
`
`Page 13
`
`Page 13
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet12 0f34
`
`US 6,624,761 B2
`
`om
`
`20.356200
`
`0.2m
`
`2035.200\29.225568
`
`nommDOE
`
`.5me
`
`20P<Z§Mwbmo
`
`om
`
`ZO_mmwma_200
`
`mat.
`
`ZOEhEOme
`
`E,(QQNQOOZM
`
`\\sEcflmfifi
`
`«OLQEUme
`
`:-
`
`c;Uxm
`
`
`
`m.0.“—
`
`awn—002m
`
`>.:..=m<x_wmo
`
`wa0h0<m
`
`Ox.
`
`:52—
`
`<...<D
`
`mwumbm
`
`<._.<o
`
`xOOAm
`
`mthDOo
`
`EEK.“mE.<Q
`
`”ESE.
`
`om
`
`.mmmb
`
`
`
`mE:QMEBNQM
`
`Page 14
`
`Page 14
`
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 13 0f 34
`
`US 6,624,761 B2
`
`10°
`
`RECEIVE INITIAL
`DATA BLOCK FROM.
`INPUT DATA STREAM
`
`
`
`102
`
`B
`
`COUNT SIZE OF
`D A
`M BLOC
`
`K
`
`104
`
`'
`'
`
`BUFFER DATA
`BLOCK
`
`106
`
`INITIALIZE TIMER
`
`1 08
`
`APPLY INPUT DATA
`BLOCK TO FIRST
`ENCODING STAGE
`IN CASCADED
`
`ENCODER PATHS
`
`110
`
`1 ‘5
`
`APPLY OUTPUT
`
`
`
`
`
`
`
`OFEquhéETSEED
`
`
`STAGE T0 NEXT
`ENCODING
`STAGE IN
`CASCADE PATH
`
`
`
`
`
`
`
`
`“0
`
`
`114
`
`1 2
`1
`YES
`
`
`YES
`
`COMPLETED
`COMPLETE?
`
`ENCOD'NG
`
`STAGE
`
`
`BUFFER
`
`
`ENCODED DATA
`
`
`BLOCK OUTPUT
`FROM
`
`
`
`
`
`
`
`STOP ENCODING
`
`PROCESS
`
`
`
`
`
`
`SELECT BUFFERED OUTPUT OF LAST
`ENCODING STAGE IN ENCODER
`CASCADE THAT COMPLETED ENCODING
`
`
`PROCESS WITHIN TIME LIMIT
`
`
`
`
`
`COUNT SIZE OF
`122
`ENCODED DATA
`
`
`BLOCKS
`
`
`
`
`CALCULATE
`COMPRESSION
`
`RATIOS
`
`
`
`COMPARE COMPRESSION
`
`
`
`RATIOS WITH THRESHOLD
`LIMIT
`
`
`124
`
`126
`
`FIG. 108
`
`A
`
`Page 15
`
`Page 15
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 14 0f 34
`
`US 6,624,761 B2
`
`A
`
`128
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`THRESHOLD?
`
`
`
`
`
`
`APPEND NULL
`CALCULATE FIGURE OF
`134
`130
`MERIT FOR EACH ENCODED
`DESCRIPTOR TO
`
`
`
`UNENCODED INPUT
`DATA BLOCK WHICH EXCEED
`
`
`
`THRESHOLD
`DATA BLOCK
`
`
` SELECT ENCODED DATA
`
`136
`
`BLOCK WITH GREATEST
`
`FIGURE OF MERIT
`
`
`
`APPEND
`OUTPUT UNENCODED
`
`
`
`CORRESPONDING
`DATA BLOCK WITH
`
`
`
`NULL DESCRIPTOR
`DESCRIPTOR
`
`
`
`
`138
`
`‘32
`
`
`
`
`OUTPUT ENCODED
`DATA BLOCK WITH
`
`
`DESCRIPTOR
`
`140
`
`
`
`142
`MORE
`
`
`DATA BLOCKS IN
`INPUT STREAM?
`
`
`
`
`YES
`
`PROCESS
`
`ERMINATE DATA
`COMPRESSION
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`
`
`3 FIG. 10b
`
`Page 16
`
`Page 16
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 15 0f 34
`
`US 6,624,761 B2
`
`332$;E.(Q
`
`EOKQEmeQ
`
`NDmmooowo_9n—mmooowo4H—
`
`ESQ$39530
`
`3291.5
`
`(badP3950
`
`mwmuzm
`
`m0mmooowo
`
`«Ohm—momma
`
`ZO_._.O<m._.Xm
`
`
`
`mmmmDmx004m
`
`<._.<DPDQ;
`Efitkm.E.<Q
`
`U
`comeOOmo
`
`3....
`
`:.OE
`
`Page 17
`
`Page 17
`
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 16 0f 34
`
`US 6,624,761 132
`
`
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`
`
`
`BUFFER DATA BLOCK
`
`
`
`EXTRACT DATA
`COMPRESSION TYPE
`
`DESCRIPTOR
`
`
`
`1200
`
`1202
`
`1 204
`
`1206
`IS DATA
`
`
`COMPRESSION
`
`TYPE DESCRIPTO
`
`
`NULL?
`
`
`
`"RECEIVE NEXT
`
`INPUT STREAM
`
`
`
`SELECT DECODER(S)
`
`
`CORRESPONDING TO
`
`
`1 220
`DESCRIPTOR
`
`
`OUTPUT
`
`
`UNDECODED
`
`
`
`DATA BLOCK
`DECODE DATA BLOCK USING
`
`
`SELECTED DECODER(S)
`
`
` DATA BLOCK IN
`
`
`OUTPUT DECODED
`DATA BLOCK
`
`
`MORE DATA
`
`
`BLOCKS IN INPUT
`
`
`STREAM?
`
`
`NO
`
`
`
`TERMINATE
`DECODING PROCESS
`
`
`1218
`
`FIG. 12
`
`Page 18
`
`Page 18
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 17 0f 34
`
`US 6,624,761 B2
`
`
`
`288523:38E950
`
`
`
`E250ago
`
`
`
`23530Sun5%.102m5853%
`
`
`
`Bmuoocmco=Emoomm
`
`
`
`.5233;62:00oz98Scam.2250
`
`rm.ouoocm
`
`On
`
`25350
`
`coacuooom
`
`355.com?3$3..
`
`9.9,
`
`
`
`<9,mmDOE
`
`Page 19
`
`Page 19
`
`
`
`
`
`
`US. Patent
`
`0
`
`74,266,
`
`2B
`
`838m
`
`.655me532.55m5?9828%8:528m.co_mwanoo
`
`
`
`
`
`a";
`
`y2.EDEoucaonvtotam
`9w0
`
`GOEOE—505.330<«022:80:35Ion?529:80:23
`
`05mm
`
`
`
`mEEmacsoototamw_7$53525essaoma0Q>P
`
`
`
`Scmmhflcsoo‘gotsm2.3UI
`
`m«meegootousm
`
`
`
`m.22:80:33.0“.m
`
`am?manor.
`
`Page 20
`
`Page 20
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 19 0f 34
`
`US 6,624,761 B2
`
`Receive initial
`
`
`
`Data Block From
`
`
`~14oo
`
`input Data
`Stream
`
`
`Count Size of
`Data Block
`”1402
`
`
`
`
`
`
`' Buffer Data Block ~14o4
`
`
`
`
`Apply Content
`Dependent Data
`Recognition
`
`
`
`‘ Recognize Data
`Stream Content
`?
`
`No—p 8
`
`FIGURE 14A
`
`Page 21
`
`Page 21
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 20 0f 34
`
`US 6,624,761 132
`
`
`
`
`
`Compress Data
`Block with
`
`
`
`~1410
`
`
`
`
`
`
`Enabled Content
`
`Independent
`Encoders
`
`
`
`Data Block Output
`
`From Each
`
`
`Buffer Encoded
`
`
`~ 1412
`
`
`
`
`
`Count Size of
`Encoded Data
`
`
`Blocks
`
`
`
`
`
`
`Calcuiate
`Compression
`Ratio
`
`
`
`Compare
`
`Compression
`
`
`Ratios with
`
`
`Threshold Limit
`
`D
`
`FIGURE 148
`
`1416
`
`1418
`
`Page 22
`
`Page 22
`
`
`
`US. Patent
`
`Sep. 23,2003
`
`Sheet 21 0f 34
`
`US 6,624,761 132
`
`
`
`Compression Ratio
`
`of at Least One Encoded Data
`
`Block Greater Than Content
`
`
`ndependent Threshol -
`1420 ~
`7
`
`
`
`
`
`Select Encoded
`Data Block with
`
`Greatest
`
`
`
`
`Append Null
`Descriptor to
`
`
`Unencoded Input
`
`
`Data Block
`Compression Ratio
`
`
`
`
`1424 ~
`
`
`
`
`
`Append
`Corresponding
`Encoding
`Descriptor
`
`
`
`
`
`1426
`
`Output Encoded
`Data Stock with
`
`
`
`
`
`Output Unenooded
`Data Block with
`Null Descriptor
`
`
`
`Descriptor
`
`
`
`
`
`
` Terminate
` More Data
`
`
`Receive Next
`Data Block From
`
`Data Compression
`Blocks in Input
`
`
`Process
`Stream
`
`
`
`
`7
`
`
`
`Input Stream
`
`Yes
`
`1 430
`
`FIGURE 14C
`
`Page 23
`
`Page 23
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 22 0f 34
`
`US 6,624,761 132
`
`1434-»
`
`
`Select Recognized
`
`
`Data Type/File or Block
`
`
`in List or Algorithm
`
`
`
`1436 -
`
`Select Content
`
`Dependent
`Algorithms)
`
`Compress Data Block
`with Enabled Content
`
`
`
`
`Dependent Enooder(s)
`
`
`1440 ~
`
`. Buffer Encoded Data
`Block From Each
`Encoder
`
`
`Count Size of Data
`
`1442
`
`Blocks
`Compression Ratio
`
`Calculate
`
`
`
`
`Compression
`
`Ratio of at Least One
`Encoded Data Block
`Greater Than
`
`
`
`
`
`
`
`
`
`
`Content Dependent
`Threshold
`
`~1448
`
`FIGURE 14D
`
`No+ F
`
`Page 24
`
`Page 24
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 23 0f 34
`
`US 6,624,761 B2
`
`
`
`EwuoocmE0338“:3on
`
`llllll. - l
`no.385
`no.ouoOcm
`ELouscw
`
`Ione
`
`so.3005Sac5:3.2500
`.52832mm5%.v.85Sun566%
`
`
`
`Escoo23
`
`coacmooum
`
`2.6.3.3
`
`5588mm
`
`3Es:
`
`355.594.
`
`9.9
`
`
`
`<9.mmDOE
`
`Page 25
`
`Page 25
`
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 24 0f 34
`
`US 6,624,761 B2
`
`880cm
`
`53>Sac
`
`.0520me
`
`
`
`2,82298
`
`
`
`
`Loafing2285op8.3226
`
`8526560562880822:80:23<2652.:
`gumcommoasoo.oNoancsootouama89
`
`so»
`
`mSEQoo
`
`838.5500
`
`223.5...x25m
`
`FmEo.caoo:ot:m
`
`mm«5:500:38
`
`mmemucacnutwtbm
`
`
`
`
`
`mmrmmDGE
`
`Page 26
`
`EDEmacsootwtam
`
`2059.5.323gum568..£00
`
`
`
`
`
`2385«59.83:.62:00
`
`PDEouczoOEotam
`
`Page 26
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 25 0f 34
`
`US 6,624,761 B2
`
`
`
`
`
`Receive Initial
`Data Block From
`
`Input Data
`
`
`
`
`
`Count Size of
`
`Buffer Data Btock ~1604
`
`
`
`
`'Appty Content
`
`Dependent Data
`Recognition
`
`1606
`
`Recognize Data
`Stream Content
`
`
`
`”0-, 8
`
`FIGURE 16A
`
`Page 27
`
`Page 27
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 26 0f 34
`
`US 6,624,761 132
`
`Compress Data
`Black with Enabled ~ 1610
`Content Independent
`
`Block Output From ~1612
`Each Encoder
`
`Count Size of
`Encoded Data Blocks “’16“
`
`Encoders Buffer Encoded Data
`
`Calculate
`Compression
`Ratio
`
`~1616
`
`Compare
`Compression Ratios
`with Threshold Limit
`
`1618
`
`D
`
`FIGURE 168
`
`Page 28
`
`Page 28
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 27 0f 34
`
`US 6,624,761 B2
`
`
`
`
`
`
`Compression Ratio
`of at Least One Encoded Data
`BIock Greater Than Content
`
`
`
`
`
`ndependent Threshol .
`1620 ~
`?
`
`E
`
`
`
`
`Append NuII
`SeIect Encoded
`
`Data Block with
`Descriptor to
`
`
`Unenooded Input
`Greatest
`
`
`Compression Ratio
`Data Block
`
`
`
`
`1524 ~ Corresponding
`Encoding
`
`Descriptor
`
`
`
`
` 1626
`Output Encoded
`Data Black with
`
`
`Descriptor
`
`
`
`Output Unencoded
`Data Block with
`
`NuII Descriptor
`
`
`
`
`
`
`
`Receive Next
`Data Block From
`
`
`Input Stream
`
`
`
`
`More Data
`Terminate
`BIocks in Input
`
`Data Compression
`
`
`Stream
`
`7
`
`
`Yes
`
`FIGURE 160
`
`Page 29
`
`Page 29
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 28 0f 34
`
`US 6,624,761 132
`
`Select Recognized
`1634-» Data Type/File or Block
`
`in List or Algorithm
`
`Select Content
`
`Dependent
`
`1638
`
`Compress Data Block
`with Enabled Content
`
`Buffer Encoded Data
`Block From Each
`
`Algorithm(s)
`Dependent Encoder(s)
`Encoder
`Blocks
`Compression Ratio
`
`Count Size of Data
`
`1642
`
`Calculate
`
`
`
`
`
`
`Compression
`Ratio of at Least
`One Encoded Data
`Block Greater Than
`
`
`
`
`
`Content Dependent
`
`
`
`
`FIGURE 16D
`
`No» B
`
`Page 30
`
`Page 30
`
`
`
`US. Patent
`
`n&
`
`U
`
`m
`
`2Ba
`
`
`
`
`
`238cmEmucmnausE250coacuoomm
`
`<no.385
`
`
`
`flouoocm23:8362:00
`
`No.885I89E.muoocm
`
`8.,E980
`
`
`
` Sacm«50:33:.N.2:22.800
`
`88Eng
`
`5:5
`
`505San.
`
`.5280
`
`Sam
`
`585
`
`w.mm.385w8.:%55.88m
`
`ucmcoazmoowmM8?sigmaMm«w838m
`
`ow
`
`
`
`cm.385$53a:-xoo._66559.1
`
`cofiefim
`
`Sw
`6,oE;
`
`
`
`7,<2.mung...
`
`Page 31
`
`Page 31
`
`
`
`
`
`
`US. Patent
`
`m
`
`hS
`
`3
`
`4,266,
`
`2B
`
`
`8.30.68003m9?cofimmanoW33530
`
`
`835808gum”.5
`
`85:332,5:,£8.B.8285En.m..o.::oo\..o§m
`
`9.53
`
`mmt,manor.
`
`mcm«Eczogotam
`
`MmmEecaootouamwEPEEBOtoEm
`
`4mmesczoogtamm
`
`Page 32
`
`<82
`
`2:30:83
`
`EEcuczootoesm
`
`
`
`noEmaconhotam(on?
`
`Page 32
`
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 31 0f 34
`
`US 6,624,761 B2
`
`Receive Initial
`
`Data Block From
`
`Input Data
`Stream
`
`A
`
`Count Snze of
`Data Block
`
`“1802
`
`"0+ B
`
`Buffer Data Block ~ 1804
`
`Dependent I
`Independent Data
`Recognition
`
`Recognize Data
`Stream Content
`
`FIGURE 18A
`
`Page 33
`
`Page 33
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 32 0f 34
`
`US 6,624,761 132
`
`B
`
`Estimate Optimum
`Encoder(s) From ~ 1850
`Estimation Algorithms
`
`Compress Data
`Block with Enabled
`
`Content lndependent
`Encoders
`
`Buffer Encoded Data
`
`Block Output From ~ 1812
`Each Encoder
`
`Ratio Compare
`
`Count Size of
`
`Encoded Data Blocks
`
`Calculate
`
`Compression
`
`1816
`
`Compression Ratios ~ 1818
`with Threshold Limit
`
`D
`
`FIGURE 188
`
`Page 34
`
`Page 34
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 33 0f 34
`
`US 6,624,761 B2
`
`
`
`
`Compression Ratio
`of at Least One Encoded Data
`
`
`
`ndependent Threshol .
`1820 ~
`?
`
`
`
`Block Greater Than Content
`
`
`
`
`
`
`Append Null
`Select Encoded
`Descriptor to
`Data Block with
`
`
`
`‘ Unencoded input
`Greatest
`
`Data Btock
`
`1822 ~ Compression Ratio
`
`1826
`
`
`
`Output Unenooded
`
`
`Data Block with
`Data Block with
`
`Null Descriptor
`
`
`Descriptor
`
`
`
`
`
`
`
`Terminate
`More Data
`Data Compression
`
`
`
`Biocks in input
`Process
`Stream
`
`
`7
`
`
`
`1830
`
`Receive Next
`
`
`
`
`
`Data Block From
`
`input Stream
`
`Yes
`
`FIGURE 180
`
`Page 35
`
`Page 35
`
`
`
`US. Patent
`
`Sep. 23, 2003
`
`Sheet 34 0f 34
`
`US 6,624,761 132
`
`1838~
`
`
`
`
`
`Seiect Recognized
`Data TypeIFile or Block
`
`in List or Aigorithm
`
`1840-»
`
`1842
`
`
`
`Select or Estimate
`Content Dependent
`
`Algorithm(s)
`Dependent Encoder(s)
`
`Compress Data Block
`with Enabled Content
`
`Buffer Encoded Data
`
`Block From Each
`Encoder
`
`Blocks
`
`Count Size of Data
`
`Compression Ratio
`
`Calcutate
`
`
`
`
`Compression
`
`Ratio of at Least
`
`
`One Encoded Data
`Btock Greater Than
`
`
`
`
`
`Content Dependent
`Threshold
`
`
`
`FIGURE 180
`
`No» F
`
`Page 36
`
`Page 36
`
`
`
`US 6,624,761 B2
`
`1
`CONTENT INDEPENDENT DATA
`COMPRESSION METHOD AND SYSTEM
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is a Continuation-In-Part of US. patent
`application Ser. No. 09/705,446,
`is now a US. Pat. No.
`6,309,424 filed on Nov. 3, 2000, which is a Continuation of
`US. patent application Ser. No. 09/210,491, filed on Dec.
`11, 1998, which is now US. Pat. No. 6,195, 024, issued on
`Feb. 27, 2001.
`
`BACKGROUND
`
`1. Technical Field
`
`The present invention relates generally to a data compres-
`sion and decompression and, more particularly, to systems
`and methods for data compression using content indepen-
`dent and content dependent data compression and decom-
`pression.
`2. Description of Related Art
`Information may be represented in a variety of manners.
`Discrete information such as text and numbers are easily
`represented in digital data. This type of data representation
`is known as symbolic digital data. Symbolic digital data is
`thus an absolute representation of data such as a letter,
`figure, character, mark, machine code, or drawing,
`Continuous information such as speech, music, audio,
`images and video, frequently exists in the natural world as
`analog information. As is well known to those skilled in the
`art, recent advances in very large scale integration (VLSI)
`digital computer technology have enabled both discrete and
`analog information to be represented with digital data.
`Continuous information represented as digital data is often
`referred to as diffuse data. Diffuse digital data is thus a
`representation of data that is of low information density and
`is typically not easily recognizable to humans in its native
`form.
`
`There are many advantages associated with digital data
`representation. For instance, digital data is more readily
`processed, stored, and transmitted due to its inherently high
`noise immunity. In addition, the inclusion of redundancy in
`digital data representation enables error detection and/or
`correction. Error detection and/or correction capabilities are
`dependent upon the amount and type of data redundancy,
`available error detection and correction processing, and
`extent of data corruption.
`One outcome of digital data representation is the continu-
`ing need for increased capacity in data processing, storage,
`and transmittal. This is especially true for diffuse data where
`increases in fidelity and resolution create exponentially
`greater quantities of data. Data compression is widely used
`to reduce the amount of data required to process, transmit,
`or store a given quantity of information. In general, there are
`two types of data compression techniques that may be
`utilized either separately or jointly to encode/decode data:
`lossless and lossy data compression.
`Lossy data compression techniques provide for an inexact
`representation of the original uncompressed data such that
`the decoded (or reconstructed) data differs from the original
`unencoded/uncompressed data. Lossy data compression is
`also known as irreversible or noisy compression. Entropy is
`defined as the quantity of information in a given set of data.
`Thus, one obvious advantage of lossy data compression is
`that the compression ratios can be larger than the entropy
`limit, all at the expense of information content. Many lossy
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`data compression techniques seek to exploit various traits
`within the human senses to eliminate otherwise impercep-
`tible data. For example, lossy data compression of visual
`imagery might seek to delete information content in excess
`of the display resolution or contrast ratio.
`On the other hand, lossless data compression techniques
`provide an exact representation of the original uncom-
`pressed data. Simply stated, the decoded (or reconstructed)
`data is identical to the original unencoded/uncompressed
`data. Lossless data compression is also known as reversible
`or noiseless compression. Thus, lossless data compression
`has, as its current limit, a minimum representation defined
`by the entropy of a given data set.
`There are various problems associated with the use of
`lossless compression techniques. One fundamental problem
`encountered with most lossless data compression techniques
`are their content sensitive behavior. This is often referred to
`as data dependency. Data dependency implies that the com-
`pression ratio achieved is highly contingent upon the content
`of the data being compressed. For example, database files
`often have large unused fields and high data redundancies,
`offering the opportunity to losslessly compress data at ratios
`of 5 to 1 or more. In contrast, concise software programs
`have little to no data redundancy and, typically, will not
`losslessly compress better than 2 to 1.
`Another problem with lossless compression is that there
`are significant variations in the compression ratio obtained
`when using a single lossless data compression technique for
`data streams having different data content and data size. This
`process is known as natural variation.
`Afurther problem is that negative compression may occur
`when certain data compression techniques act upon many
`types of highly compressed data. Highly compressed data
`appears random and many data compression techniques will
`substantially expand, not compress this type of data.
`For a given application, there are many factors that govern
`the applicability of various data compression techniques.
`These factors include compression ratio, encoding and
`decoding processing requirements, encoding and decoding
`time delays, compatibility with existing standards, and
`implementation complexity and cost, along with the adapt-
`ability and robustness to variations in input data. A direct
`relationship exists in the current art between compression
`ratio and the amount and complexity of processing required.
`One of the limiting factors in most existing prior art lossless
`data compression techniques is the rate at which the encod-
`ing and decoding processes are performed. Hardware and
`software implementation tradeoffs are often dictated by
`encoder and decoder complexity along with cost.
`Another problem associated with lossless compression
`methods is determining the optimal compression technique
`for a given set of input data and intended application. To
`combat this problem, there are many conventional content
`dependent techniques that may be utilized. For instance, file
`type descriptors are typically appended to file names to
`describe the application programs that normally act upon the
`data contained within the file. In this manner data types, data
`structures, and formats within a given file may be ascer-
`tained. Fundamental limitations with this content dependent
`technique include:
`(1) the extremely large number of application programs,
`some of which do not possess published or documented
`file formats, data structures, or data type descriptors;
`(2)
`the ability for any data compression supplier or
`consortium to acquire, store, and access the vast
`amounts of data required to identify known file descrip-
`tors and associated data types, data structures, and
`formats; and
`
`Page 37
`
`Page 37
`
`
`
`US 6,624,761 B2
`
`3
`
`(3) the rate at which new application programs are devel-
`oped and the need to update file format data descrip-
`tions accordingly.
`An alternative technique that approaches the problem of
`selecting an appropriate lossless data compression technique
`is disclosed, for example, in US. Pat. No. 5,467,087 to Chu
`entitled “High Speed Lossless Data Compression System”
`(“Chu”). FIG.
`1 illustrates an embodiment of this data
`compression and decompression technique. Data compres-
`sion 1 comprises two phases, a data pre-compression phase
`2 and a data compression phase 3. Data decompression 4 of
`a compressed input data stream is also comprised of two
`phases, a data type retrieval phase 5 and a data decompres-
`sion phase 6. During the data compression process 1, the
`data pre-compressor 2 accepts an uncompressed data stream,
`identifies the data type of the input stream, and generates a
`data type identification signal. The data compressor 3 selects
`a data compression method from a preselected set of meth-
`ods to compress the input data stream, with the intention of
`producing the best available compression ratio for that
`particular data type. There are several limitations associated
`with the Chu method. One such limitation is the need to
`
`unambiguously identify various data types. While these
`might include such common data types as ASCII, binary, or
`unicode, there, in fact, exists a broad universe of data types
`that
`fall outside the three most common data types.
`Examples of these alternate data types include: signed and
`unsigned integers of various lengths, differing types and
`precision of floating point numbers, pointers, other forms of
`character text, and a multitude of user defined data types.
`Additionally, data types may be interspersed or partially
`compressed, making data type recognition difficult and/or
`impractical. Another limitation is that given a known data
`type, or mix of data types within a specific set or subset of
`input data, it may be difficult and/or impractical to predict
`which data encoding technique yields the highest compres-
`sion ratio.
`
`there is a need for a data compression
`Accordingly,
`system and method that would address limitations in con-
`ventional data compression techniques as described above.
`
`SUMMARY OF THE INVENTION
`
`The present invention is directed to systems and methods
`for providing fast and efficient data compression using a
`combination of content independent data compression and
`content dependent data compression. In one aspect of the
`invention, a method for compressing data comprises the
`steps of:
`analyzing a data block of an input data stream to identify
`a data type of the data block, the input data stream
`comprising a plurality of disparate data types;
`performing content dependent data compression on the
`data block, if the data type of the data block is identi-
`fied;
`performing content independent data compression on the
`data block, if the data type of the data block is not
`identified.
`
`In another aspect, the step of performing content inde-
`pendent data compression comprises: encoding the data
`block with a plurality of encoders to provide a plurality of
`encoded data blocks; determining a compression ratio
`obtained for each of the encoders; comparing each of the
`determined compression ratios with a first compression
`threshold; selecting for output
`the input data block and
`appending a null compression descriptor to the input data
`block, if all of the encoder compression ratios do not meet
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`the first compression threshold; and selecting for output the
`encoded data block having the highest compression ratio and
`appending a corresponding compression type descriptor to
`the selected encoded data block,
`if at
`least one of the
`compression ratios meet the first compression threshold.
`In another aspect, the step of performing content depen-
`dent compression comprises the steps of: selecting one or
`more encoders associated with the identified data type and
`encoding the data block with the selected encoders to
`provide a plurality of encoded data blocks; determining a
`compression ratio obtained for each of the selected encod-
`ers; comparing each of the determined compression ratios
`with a second compression threshold; selecting for output
`the input data block and appending a null compression
`descriptor to the input data block,
`if all of the encoder
`compression do not meet the second compression threshold;
`and selecting for output the encoded data block having the
`highest compression ratio and appending a corresponding
`compression type descriptor to the selected encoded data
`block, if at least one of the compression ratios meet the
`second compression threshold.
`In yet another aspect,
`the step of performing content
`independent data compression on the data block, if the data
`type of the data block is not identified, comprises the steps
`of: estimating a desirability of using of one or more encoder
`types based one characteristics of the data block; and
`compressing the data block using one or more desirable
`encoders.
`
`In another aspect, the step of performing content depen-
`dent data compression on the data block, if the data type of
`the data block is identified, comprises the steps of: estimat-
`ing a desirability of using of one or more encoder types
`based on char