`
`United States Patent
`Fallon
`
`(10) Patent NIL:
`(45) Date of Patent:
`
`US 6,624,761 B2
`*Sep. 23, 2003
`
`USO06624761B2
`
`(54)
`
`(.'0N'l'l'ZNT INDEPENDENT DATA
`COMPRESSION METHOD AND SYSTEM
`
`(75)
`
`Inventor:
`
`James J. Fallon, Armonlt, NY (US)
`
`(73)
`
`Assignee: Realtime Data. LLC, New York, NY
`(US)
`
`t*)
`
`Notice:
`
`Subject to any dL-selaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. t54(b) by (1 days.
`
`This patent is subject to a tr-.-.rn1inal dis-
`claimer.
`
`(31)
`
`Appl. No; l0f[|l6_.355
`
`(32
`
`Filed:
`
`(55)
`
`Oct. 29, Zfllll
`Prior I’n|)lieation Data
`
`US 2tIJ2t’fltt9't'l72 Al Jul. 25, 2002
`
`( 53)
`
`(51)
`
`(53)
`(58)
`
`(55)
`
`Related US. Application Data
`
`Continuation-in-part ofnpplicalion No. t)9,t"?t)5_.44t3, filed on
`Nov. 3. 2ttt|0_. now Pat. No. £1,300.42-1_. which is a continu-
`ation of application No. t)9t'2 ll_),49l, Filed on Dec.
`I I. l‘?|93_,
`now Pat. No. f_1,t9S_.tJ24.
`
`Int. CL? ....................... .. G061’ l3.(l2: Ci()6F 1338;
`I-103M ‘H3-1; H03M "N38
`.......................................... .. 34-l!5l; 71()t68
`U.S. Cl.
`Field of Search
`341,5], 6?‘, 106,
`341.60, 79; 382E166.
`ll_)[], 239; 7lt};’S."-2.
`68; 3481458; 7tJ'r;1n1,- 2135.431; '7rt9)2tta;
`?04.*245; 375E240
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4_.hS2_.lS0 A "
`='1,t-.i?2_,ttfJD A
`-’l,92‘)_.tJ-it‘) A
`5,045,852 A
`5,t¥9'r'_.26I A
`5,212,?-’l2 A
`
`‘H1987 Matltes at al.
`10;’ I989 'l‘sukiynnta et al.
`5;’ 1990 O'Brien et at.
`‘Sh’ 1'39] Mitchell et at.
`3t"l‘J'9.'-_' L‘-Jngdon, Jr. et al.
`5t’ltN3 Normile et at.
`
`23:’-t-131
`
`5.2315192 A
`5.237.675 A *
`5,243,341 A
`5.2-13.348 A
`5_.270__832 A
`5_.3?'J.tJ30 A
`3.331.145 A
`i"t,39'-'l.53=l A
`5,-'3l-lJl,6-tlq A
`5..-467.087‘ A
`5.47l.2[)6 A
`5_,4'.I"9,5ST A
`5,-436_.82fi A
`5.-195.244 A
`5,533II5.l A
`5,557.74‘! A "‘
`5.533.500 A
`5,627,534 A
`5,554,703 A
`5_.Cn‘)S_.T37 A
`S_.7l?‘,393 A
`5_.7l7,3‘J=l A
`5_,?29_,22B A
`5,743,904 A
`5_.771_.340 A
`5,784,572 A
`S,?<)9,l10 A
`5.805.032 A
`
`'t'lt|'t'I‘t8
`
`‘#1993 Dnngi et al.
`3r’J993t Haltnon. Jr.
`9,.'I‘J93 Scrolls.-at et al.
`(M1093 Jackson
`l2;']99_’t Balkanski el al.
`l,tI995 Storer
`l)'l0'S'5 Allen t‘l. at.
`2/]9‘J5 Kulakowski el al.
`ItJ)'l995 Norntile at all.
`11.51095 Chll
`ll,"l995 Allen et nl.
`l2,r']905 Campbell el al.
`l;’l09t¥i Remfllard
`ZEIFJEJ6 Jeong et at.
`7,-'l99fi James
`‘}'r1*.}9f‘i Norris ...................... .. Tt')9)'3l')8
`1211996 Allen et at.
`5;'l997 Craft
`8;‘l*J97 C1.-uk,1I
`9.'l9.‘»'7
`[let
`2;'l‘»"3t8 Nakano el al.
`2t’]9'3|8 Schwartz et al.
`M1995 Franaszeli et ul.
`5,fl‘»"3t8 Huang et at.
`taflaaa Natal‘/.att)e.t al.
`7t'l99S Rustolrer et al.
`$1098 [araeltsen et al.
`931998 Kawashima el al.
`
`{List continued on next page.)
`
`I’?'t_fltrrry I:'xtmtfne:'—l’alrick Warnsley
`(74) Attorne_tf,.4gettr, or F.trm—F. Chan & Associates, LLP;
`Frank V. Deflosa, Esq.
`(57)
`
`ABSTRACT
`
`Systems and methods for providing fast and efficient data
`compression using a combination of content
`independent
`data compression and content dependent data eomprefiion.
`In one aspect. at rnethod for comtaressing data cornprises the
`steps of: analyzing a data black cat‘ an input data stream to
`identify a data type ol' the data hlock, the input data stream
`comprising a plurality of disparate data types; performing
`content dependent data oompresaion on tt1e data block, if the
`data. type of the data block. is identified; performing content
`independent data compression on the data block, it‘ the data
`type of the data block is not identified.
`
`22 Claims, 34 Drawing Sheets
`
`
`
`Oracle 1026
`
`Oracle 1026
`
`
`
`US 6,624,761 B2
`Page 2
`
`U.S. PA{l‘liN'1‘I)OCUMJ.*N'I‘.'$
`_
`__
`9!1‘4‘*8 Yanma
`10:’ 1993 Langley
`W1993 Canfield Blah
`ll_1Il'?|98 Dobson etal.
`W1-'='93 Canfidd 9! all
`120093 Cunfield er al.
`HWI99 Ryu elal.
`
`5,809,176 A
`5.818.368 A
`5,B13.~530 A
`5_.819_.215 A
`5:325.-424 A
`5,s47_.752 A
`5.8613824 A
`
`60999 Ando
`5.017.433 A
`-10.0999 Packard
`5_,9.54_342 A
`101999 Fullelnl.
`5391515 A
`32000 Gilbert at al.
`0,031,939 A
`5,195,024 I31 * 22001 Fallon
`0,309,424 B1 * 10200-1 Fallon ........ .,
`0,529,033 B1 * 32003 Easwar at 01.
`
`34051
`
`34051
`............ .. 3523230
`
`* oiled by examiner
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 1 M34
`
`US 6,624,761 B2
`
`INPUT DATA STREAM
`
`I I I I I
`
`r’
`
`1
`
`IDENTIFY INPUT DATA TYPE AND
`GENERATE DATA TYPE IDENTIFICATION
`
`2
`
`DATA TYPE
`
`ID SIGNAL
`
`SIGNAL
`
`
`COMPRESS DATA IN ACCORDANCE WITH
`IDENTIFIED DATA TYPE
`
`3
`
`I
`N
`
`RETRIEVE DATA TYPE
`INFORMATION OF COMPRESSED
`DATA STREAM
`
`DECOMPRESS DATA IN ACCORDANCE
`WITH IDENTIFIED DATA TYPE
`
`FIG. 1
`
`PRIOR ART
`
`u_n:—uu—-_-u----n.njz.:nsq..1.-—-qpn._—...g-1
`
`r-
`
`I I I
`
`I
`
`I I I I I
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 2 of 34
`
`US 6,624,761 B2
`
`zo_mmum.__2oo
`
`mn_>._.
`
`zoc.a_momwn
`
`zo_mmmE_.aoo
`
`zom_m<n_:oo
`
`__zo_._.¢z_sEm:.monmmpznou
`
`O_.._.<mEmtnm
`
`mmmmnouzm
`
`5._z_
`
`<...<n_
`
`mmtom
`
`<._-<0
`
`xoo._m
`
`mmhz300
`
`.Em..EDm
`
`:mmE,5ou
`
`2..8
`
`N.07.
`
`
`
`
`
`.3.Efimflhw.Emn_n=...mumzmooozm
`
`
`
`<._.¢..uQMQOUZW
`
`:..._m..__...=..m
`
`_.15.2300
`
`3<mEmE.3
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 3 M34
`
`US 6,624,761 B2
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`300
`
`B
`
`counrr SIZE OF
`DATA BLOCK
`
`302
`
`304
`ENCODER
`
`BUFFER DATA BLOCK
`
`COMPRESS DATA
`BLOCK WITH
`ENABLED ENCODERS
`
`BUFFER ENCODED
`DATA BLOCK OUTPUT
`FROM EACH
`
`COUNT SIZE OF
`ENC-ODED DATA
`
`BLOCKS
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`THRESHOLD LIMIT
`
`commas
`COMPRESSHDN
`RATIOS wnm
`
`31 2
`
`314
`
`FIG. 3a
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 4 of 34
`
`US 6,624,761 B2
`
`A
`
`316
`
`IS
`
`COMPRESSION
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`THRESHOLD?
`NO
`
`
`
`
`
`
`
`
`
`SELECT ENCODED
`DATA BLOCK WITH
`GREATEST
`COMPRESSION RATIO
`
`
`
`318
`
`
`
`
`
`APPEND NULL
`DESCRIPTOR TO
`UN ENCODED INPUT
`DATA BLOCK
`
`
`
`APPEND
`CORRESPONDING
`
`DESCRIPTOR
`
`
`
`
`
`
`
`OUTPUT ENCODED
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`OUTPUT UNENCODED
`320
`DATA BLOCK WITH
`
`NULL DESCRIPTOR
`
`
`
`
`
`
`
`MORE
`DATA BLOCKS IN INPUT
`STREAM?
`
`
`
`332 TERMINATE DATA
`
`PROCESS
`
`COMPRESSION
`
`
`
`YES
`
`330
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`
`
`
`FIG. 3b
`
`
`
`netaPamU
`
`M
`
`43.m
`
`4|:266...
`
`2B
`
`mw.o_”_
`
`S.
`
`._mmpzaooMEm_"_..5mcmmuaoozm
`
`
`
`Wzo=.<z_s_.mmEommohofi
`
`
`
`nommaoammooozm
`
`
`
`tum:>.:.__m<m.mmo
`
`2.on
`
`
`
`
`
`
`
`..zo_._.w_&5.m._m.rmn_amfismmmmmooozm<.Eo50.6 zom_m§__2oo3zo_.E_momwn_mw_m._.2DDOmm_..E:mmmpznoommat.
`
`
`
`_em.._u__..mEanomaoozm.m.zo.E.mommn_Nmmpznoo&3.sqmmhm
`
`
`
`tEmutam
`
`
`
`Fmmpzpoozwmfim3:3
`
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 6 ar34
`
`US 6,624,761 B2
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`500
`
`B
`
`COUNT SIZE or
`
`DATA BLOCK
`
`5”
`
`BUFFER DATA BLOCK
`
`504
`
`compness DATA
`BLOCK WITH
`ENABLED ENCODERS
`
`595
`
`APPEND CORESPONDING
`DESIRABILTFY FACTORS To
`ENCODED DATA BLOCKS
`
`BUFFERENCODED DATA
`BLOCK OUTPUT
`mom EACH ENCODER
`
`508
`
`510
`
`coum SIZE OF
`ENCODED DATA
`BLOCKS
`
`CALCULATE
`compnessrou
`RATIOS
`
`513
`
`5“
`
`COMPARE COMPRESSION
`RATIOS wma THRESHOLD
`LIMIT
`
`5,5
`
`A
`FIG. 5a
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 7 of 34
`
`US 6,624,761 B2
`
`A
`
`518
`IS
`
`
`COMPRESSION
`
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`THRESHOLD?
`
`
`
`
`
`520
`
`
`
`
`
`APPEND NULL
`DESCRIPTOR T0
`UNENCODED INPUT
`DATA BLOCK
`
`
`
`CALCULATE FIGURE OF
` S24
`MERIT FOR EACH ENCODED
`DATA BLOCK WHICH EXCEED
`
`THRESHOLD
`
`
`SELECT ENCODED DATA
`BLOCK WITH GREATEST
`FIGURE OF MERIT
`
`
`
`
`
`
`APPEND
`CORRESPONDING
`DESCRIPTOR
`
`
`
`
`
`OUTPUT UNENCODED
`DATA BLOCK WITH
`NULL DESCRIPTOR
`
` OUTPUT ENCODED
`
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`
`INPUT STREAM?
`
`M ORE
`DATA BLOCKS IN
`
`TERMTNATE DATA
`
`
`COMPRESSION
`
`
`PROCESS
`
`STREAM
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`
`534
`
`B
`
`FIG. 5b
`
`
`
`tH8taPSU
`
`S
`
`M“
`
`SU
`
`.426...
`
`M
`
`omnouzm.
`
`3...§2<m.EmRm..._..5mMEma
`
`
`,eEmpzaoo:.<mEwESQ5EmtzmEmmooozm
`
`M.7.,oO_u_
`
`68
`
`8I
`
`4._«minceMEmnmnmcmmmnouzm
`
`zo_mmmE_2ouSn;«HommoumiumuaN«E2200
`
`mato_...<¢Rmnmam«Eazoo._mmzo_mwmm_..=._oo
`
`
`zo_E_momwn_.z_,.n_..“_uw..a_.W,u__._._...“.m.~_u.zmnH.»"mnnmminoommuiammmhzaoo
`
`émmn
`
`
`
`ms:omiuumm
`
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 9 of 34
`
`US 6,624,761 B2
`
`7°”
`
`70
`
`B
`
`INPUT INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`coum size or
`DATA BLOCK
`
`704
`
`BUFFER DATA BLOCK
`
`795
`
`mmALIzE TIMER
`
`7I ;
`
`BEGIN
`coMF-Ressme
`DATA BLOCK wm-t
`ENCODERS
`
`
`
`
`
`
`710
`
`
`“ME EXPIRED?
`
`
`
` STOP
`ENCODING
`PROCESS
`
`
`
`
`718
`BUFFER ENCODED
`
`ENc%'§;§'3ATA
`DATA BLOCK FOR EACH
`BLOCK OUTPUT
`enconsn THAT
`
`FROM EACH
`COMPLETED ENCODING
`PROCESS
`E”°°°ER
`‘MTHIN TIME LIMIT
`
`
`
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`720
`
`T22
`
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`
`
`724
`
`
`
`
`
`COMPARE COMPRESSION
`RATIOS WITH THRESHOLD
`LIMIT
`
`
`
`FIG. 73
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 10 of 34
`
`US 6,624,761 B2
`
`726
`IS
`
`
`COMPRESSION
`
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`
`
`
`
`
`
`SELECT ENCQDED
`DATA BLOCK WITH
`GREATEST
`COMPRESSION RATIO
`
`723
`
`
`
`APPEND NULL
`OESCRIPTOR TO
`UNENCODED INPUT
`
`DATA BLOCK
`
`
`
`
`
`
`APPEND
`CORRESPONDING
`
`DESCRIPTOR
`
`OUTPUT UNENCODED
`
`OUTPUT ENCODED
`DATA BLOCK WITH
`DATA BLOCK WITH
`
`NULL DESCRIPTOR
`DESCRIPTOR
`
`730
`
`
`
`
`
`
`
`
`MORE
`DATA BLOCKS IN INPUT
`STREAM?
`
`740
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`
`
`
`
`TERMINATE DATA
`COMPRESSION
`
`
`PROCESS
`
`
`
`FIG. 7b
`
`
`
`tn8taD1SU
`
`.3.mm%u
`
`zm.
`
`.4
`
`67.,
`
`2
`
`W.on
`
`BO1m0_n_
`
`
`
`mmnoozm
`
`
`
`>.:.__m<m_mmo8nomanor.Szo_Ez_§_Emommo.5EUEm:
`
`sEEnmnoozm
`
`
`3..zoawmfizoofi<o5%..2moimommnNmmpzaoumasEwmmnw
`Emnmammmmuooozm
`5%.xoo._mEommmoommmzo_mmmE_2oooE§
`
`
`
`
`2o_.Ez_sEmEonmmpzpoommmwooozwmmnmnmzomE...%_oo.mmtsoo3zo_.E_mommn_
`
`89.
`
`
`
`Empzaoo23$»EEEmuuzmEmmooozm
`
`4.mmm:
`
`
`
`
`
`$2:ms:omiommm
`
`
`
`
`m
`
`2
`
`fl\..S...<¢.mmc..M
`4.,mo.EEommo
`cm3.Eamnoozm
`
`B.1mOE
`
`.43
`
`S
`
`Uon9..
`
`
`
`zo_.E_mummomm9.o<“_
`
`
`
`
`
`mmaoozmmn.>._..c._.__ms..=m.u.o8zo_mmmm...__2oo
`
`U.S. Patent
`
`_:.<mEm.«C.so
`
`onM.mzom_m<%_oo
`
`5.5Szo_mmmzn_s_ou
`._x5...:o8.zoE_.2__2mE.m_ov_ooa_moW0:5.
`
`
`wuu_ommzot
`
`
`
`.mmus:mi:omiommm2IHzo:<z_:mmEo«mm:aEm:
`
`
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 13 of 34
`
`US 6,624,761 B2
`
`110
`
`‘I16
`
`
`
`APPLY OUTPUT
`
`OF COMPLETED
`ENCODING
`STAGE TO NEXT
`ENCODING
`STAGE IN
`CASCKDE PATH
`
`
`
`
`
`
`
`
`
`
`
`BUFFER
`ENCODED DATA
`BLOCK OUTPUT
`FROM
`
`STAGE
`
`COMPLETED
`ENCODING
`
`118
`
`8 TOP ENCODING
`PROCESS
`
`100
`
`102
`
`B
`
`RECEIVE INITIAL
`DATA BLOCK FROM.
`INPUT DATA STREAM
`
`
`
`COUNT SIZE OF
`DATA BLOCK
`
`104
`
`BUFFER DATA
`BLOCK
`
`105
`
`INITIALIZE TIMER
`
`1 03
`
`APPLY INPUT DATA
`BLOCK To FIRST
`ENCODING STAGE
`IN CASCADED
`
`ENCODER PATHS
`
`
`
`
`
`
`
`SELECT BUFFERED OUTPUT OF LAST
`ENCODING STAGE IN ENCODER
`CASCADE THAT COMPLETED ENCODING
`PROCESS WITHIN TIME LIMIT
`
`
`
`
`
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`
`122
`
`
`
`
` CALCULATE
`COMPRESSION
`RATIOS
`
`124
`
`126
`
`
`
`COMPARE COMPRESSION
`RATIOS WITH THRESHOLD
`LIMIT
`
`
`
`FIG. 10a
`
`A
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 14 of 34
`
`US 6,624,761 B2
`
`A
`
`128
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`ENCODED DATA BLOCK
`GREATER THAN
`THRESHOLD?
`
`
`
`
`134
`
`
`CALCULATE FIGURE or
`APPEND NULL
`130
`MERIT FOR EACH ENCODED
`oescmrrroa TO
`
`UNENCODED INPUT
`DATA BLOCK WHICH EXCEED
`
`THRESHOLD
`DATA BLOCK
`
`
`
`
`
`
`SELECT ENCODED DATA
`BLOCK WITH GREATEST
`FIGURE OF MERIT
`
`136
`
`
`
`-
`
`OUTPUT UNENCODED
`DATA BLOCK WITH
`NULL DESCRIPTOR
`
`132
`
`APPEND
`CORRESPONDING
`OESCRIPTOR
`
`
`
`138
`
`
`
`14°
`
`OUTPUT ENCODED
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`MORE
`DATA BLOCKS IN
`INPUT STREAM?
`
`
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`YES
`
`ERMINATE DATA
`COMPRESSION
`PROCESS
`
`
`
`PamU
`
`tn
`
`S
`
`3
`
`3
`
`3MEm
`
`.426cmSU
`
`7.,
`
`2
`
`.wmokfiummna.....S2SSEwe
`
`
`
`3.10..5uE._O
`
`mBmmooomo
`
`w_EmmooumoL
`
`4_.5mmooomo_
`
`3:
`
`BOM_._.OE
`
`
`
`«Ummumamzo_E<E.xmmmnmnmxoo..m
`
`
`
`
`
`
`
`simfiw5.3.5.E._o8mmooomo¢o.E_mommoSan..Saz_simfim3..E
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 16 0l'34
`
`US 6,624,761 B2
`
`1200
`
`1202
`
`
`
`
`RECEIVE mrruu.
`DATA BLOCK FROM
`
`INPUT DATA STREAM
`
`
`
`
`BUFFER DATA BLOCK
`
`EXTRACT DATA
`
`COMPRESSION TYPE
`
`DESCRIPTOR
`
`1 204
`
`1206
`
`-
`
`IS DATA
`COMPRESSION
`TYPE DESCRIPTO
`NULL?
`
`
`
`
`
`
`
`SELECT DECODER(S)
`CORRESPONDING TD
`
`DESCRIPTOR
`
`
` OUTPUT
`
`UNDECODED
`DATA BLOCK
`
`OUTPUT DECODED
`DATA BLOCK
`
`
`MORE DATA
`BLOCKS IN INPUT
`
`STREAM?
`
`1218
`
`
`
`
`
`TERMINATE
`DECODING PROCESS
`
`
`NO
`
`FIG. '12
`
`
`
`
`RECEIVE NEXT
`DATA BLOCK IN
`INPUT STREAM
`
`DECODE DATA BLOCK USING
`SELECTED DECODER(S]
`
`
`
`
`
`U.
`
`LI.net3P
`
`MB.WS
`
`43.mHm
`
`SU
`
`rm.
`
`2B167.,4
`
`
`
`
`
`SeouoocmEoucmaooEscoo
`
`
`
`W.2.9
`
`<9.manor.
`
`coa_ca3omm2_...§En.
`
`
`
`35.2..
`
`3555?.
`
`
`%Eoooocmco___:m8om
`
`
`
`Eou:3_mu:_E250ozSun.Scamfiasco
`
`.28mE25086
`
`
`
`Eoucflonmin.5%..__uo_m
`
`
`
`
`SU
`
`e
`
`3
`
`4
`
`(0
`
`4!:2,3.
`
`2B
`
`Manescaooumuam&.
`
`
`
`..nIa.NDn._$c:o0tu_=....m3..OmarP.Eescaougum
`
`.m.an2u.c:oQ._u=:m<
`
`2..uuuoocm
`
`
`
`u:_E._2an_.oE_._o«o_n_co_um2...=._._oocu_a...o.aEooW5?sunuco&<
`
`
`
`
`
`
`
`
`s§nE$n_s2_:3m.__.
`
`3323¢
`
`mm?mm:or.
`
`J.m.3
`
`cmEflczaototam
`
`mm_.mw_u_c:ootuu:m
`
`m3e2§oo_3§m
`
`
`
`m.e2:3oE§mm
`
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 19 0l'34
`
`US 6,624,761 B2
`
`
`
`Receive Initial
`
`Data Block From
`
`Input Data
`
`
`
`Buffer Data Block
`
`
`
`
`Apply Content
`Dependent Data
`Recognition
`
`1405
`
`
`Recognize Data
`Stream Conienl
`
`N°'’ 3
`
`FIGURE 14A
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 20 M34
`
`US 6,624,761 B2
`
`
`
`Compress Data
`Block with
`
`
`
`Enabled Content
`
`
`
`
`Independent
`
`Buffer Encoded
`
`From Each
`
`Count Size of
`
`
`
`Blocks
`
`
`
` Calculate
`Compression
`Ratio
`
`1416
`
`
`
`
`
`-#1418
`
`
`
`Compare
`Compression
`Ratios with
`
`Threshold Limit
`
`
`
`
`D
`
`FIGURE 14B
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 21 of 34
`
`US 6,624,761 B2
`
`
`
`
`Compression Ratio
`
`of at Least one Encoded Data
`Block Greater Than Content
`ndependent Threshol-
`1420 ~
`7
`
`
`
`
`
`
`Select Encoded
`Data Block with
`
`Greatest
`
`Compression Ratio
`
`
`
`
`Output Unencuded
`Data Black with
`
`Null Descriptor
`
`Receive Next
`
`
`Data Black From
`
`Input Stream
`
`Blacks in Input
`Strrn
`
`Terminate
`
`Data Compression
`Process
`
`
`
`
`
`
`
`Yes
`
`1430
`
`FIGURE 14C
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 22 of 34
`
`US 6,624,761 B2
`
`1434-»-
`
`
`Select Recognized
`
`
`Data Typalfila or Black
`
`
`in List or Atgorithm
`
`
`
`
`
`Cornpress Data Block
`with Enabled Content
`Dependent Enooder(s)
`
`1438
`
`- Buffer Encoded Data
`‘Block From Each
`
`1442
`
`Compression
`Ratio of at Least One
`
`Encoded Data Block
`Greater Than
`
`Content Dependent
`'l11rashold
`
`-- 1 448
`
`Encoder
`
`
`
`No-5 F
`
`
`
`
`
`
`
`Calculate
`Compression Ratio
`
`Yes
`«Ir
`
`E
`
`FIGURE 14D
`
`
`
`..lH8taPemU
`
`Sep. 23, 2003
`
`Sheet 23 of 34
`
`2B167.,42,3.6SU
`
`
`
`
`
`EouoocmEuucoaonEflcoo
`
`co_._:mouom
`
`o__n=_EmD
`
`co_._co3om
`
`.5Es:
`
`§§____3_<
`
`<3mm_:o_u_
`
`
`
`_.n_.3005
`
`anLouoocm
`
`onEuoocm
`
`E2.._o0
`
`
`23Ecam§§oo
`
`
`
`.cou5%nsun§_c_._8_msun
`
`
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 24 of 34
`
`US 6,624,761 B2
`
`Euoocm
`
`553.3
`
`§n.E_muo
`co_muEnEo0
`
`33
`
`.oa.=oman
`
`oaam:o_mmEn_Eoo
`
`
`
`sofiefi2,B<
`
`—.DEflczootauam
`
`NDEflcaootutam
`
`no2oE=ooE=...m
`
`EDescaoouéam
`
`
`
`__..._o:m2£.32mmgum.._o_m..,m._.
`
`u.9_§5.
`
`m:_E_2mn_
`
`.._£3...
`
`co_maa._n.Eo0ame2§ooE__smom._o_uoocm
`
`
`
`_.wEozaoototam_.mEuoocm
`
`§.._._..8.._.§_
`
`cmea§oQ.m§m
`
`
`
`
`
`mm...manor.
`
`
`
`
`
`EuuoocmEuuc2_mu_.._Eficoo
`
`
`
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 25 of 34
`
`US 6,624,761 B2
`
`
`
`Receive Initial
`Data Block From
`
`Input Data
`
`
`
`Buffer Data Block --1604
`
`
`
`
`
`“Apply Content
`Dependent Data
`Recognition
`
`1606
`
`
`
`Recognize Data
`Stream Content
`
`FIGURE 16A
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 26 of 34
`
`US 6,624,761 B2
`
`Compress Data
`Block with Enabled ~ 1 610
`Content Independent
`
`Block Output From --1612
`Each Encoder
`
`Enoodars Buffer Encoded Data
`
`Encoded Data Blocks
`
`Calculate
`
`Compression
`Ratio
`
`Compare
`Compnession Ratios
`with Threshold Limit
`
`1618
`
`D
`
`FIGURE 16B
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 27 or 34
`
`US 6,624,761 B2
`
`
`
`
`
`
`
`Compression Ratio
`of at Least One Enouded Data
`Block Greater Than Content
`
`
`
`ndependent Threshol -
`1520 ~
`?
`
`
`
`
`Append Null
`Descriptor to
`Unenooded Input
`Data Block
`
`Output Unancoded
`Data Block with
`
`Null Descriplor
`
` Terminate
`Data Compression
`
`FIGURE 16C
`
`
`
`
`
`
`Select Encoded
`Data Block with
`
`
`
`Greatest
`
`Compression Ratio
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 28 M34
`
`US 6,624,761 B2
`
`Salad Recognized
`1634 —- Data Typa!File or Black
`
`in List or Algorithm
`
`Select Content
`
`Algorithms)
`
`Dapendenl
`
`
`
`Compress Data Block
`with Enabled Content
`
`Dependant Encoder(s)
`
`1638
`
`
`
`1642
`
`Count Size of Data
`Blocks
`
`
`
`Calculate
`Compression Ratio
`
`Content Dependent
`
`~ 1648
`
`
`
`Yes
`1'
`
`E
`
`FIGURE 16D
`
` Compression
`
`
`
`
`
`Ratio of at Least
`
`One Encoded Data
`
`Block Greater Than
`
`No» B
`
`
`
`..lH8taPemU
`
`Sep. 23, 2003
`
`Sheet 29 of 34
`
`US 6,624,761 B2
`
`
`
`EovoucmE%:m%...._.__Eflcoocoacuooom
`
`
`
`
`
`mmmausfi%EEuoocm
`.me:afifiuo
`
`
`
`
`
` ao_n£.a:.v_o3.5.28§._._3u_<:o_ae_.mm
`.ucuco_=cm3um_
`
`o_L.—,
`
`
`
`(E.mm:e_u_
`
`
`
`
`
`EmuoocmEoucmnooEECOO
`
`<
`
`
`
`3fi.e_mu,_amm.m__23sac.x3_msunEumaw
`22:8sun
`
`Eco..o.§m._2c:oo
`
`
`
`
`SU
`
`_..HS
`
`13
`
`M
`
`(D
`
`4|:26..
`
`m.
`
`
`.23.3.332m mu.2.._.=_3
`co_$m.aEooo:_..E2on_3mac.co_$EaEooM._oa_._ommo
`
`
`
`
`.oa_..o$n€a.._..._._mm_m.5
`
`mEe2_..:ou:o_=:m
`
`WmmEafisootoeam
`
`
`
`‘WNoEscaootofimsconmrP.EEecaootozam
`
`.m.anm.§_.._._oo>_oe:m<
`
`m
`
`mmtmanor.
`
`WcmEscaaonoezm
`
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 31 of 34
`
`US 6,624,761 B2
`
`Receive Initial
`
`Data Block From
`
`Input Data
`Stream
`
`A
`
`Count Size of
`Data Block
`
`‘"1302
`
`Buffer Data Block «-1804
`
`Apply Content
`Dependent I
`Independent Data
`Recognition
`
`Recognize Data
`Stream Content
`
`No‘, B
`
`FIGURE 18A
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 32 of 34
`
`US 6,624,761 B2
`
`B
`
`Estimate Optimum
`Enooder(s) From
`Estimation Algorithms
`
`Compress Data
`Block with Enabled
`
`Content Independent
`Encoders
`
`Buffer Enooded Data
`
`Block Output From
`Each Encoder
`
`Count Size of
`
`Encoded Data Blocks
`
`Calculate
`
`Compression
`Ratio
`
`Compare
`Compnession Ratios
`with Threshold Limit
`
`D
`
`FIGURE 18B
`
`
`
`U.S. Patent
`
`Sep. 23,2003
`
`Sheet 33 of 34
`
`US 6,624,761 B2
`
`
`
`
`
`Compression Ratio
`of at Least One Encoded Data
`
`
`
`Block Greater Than Content
`
`
`
`
`Append Null
`Descriptor to
`
`Unenooded input
`Data Block
`
`
`
`
`Select Encoded
`Data Block with
`
`Greatest
`
`1322 "- Compression Ratio
`
`1824 ~
`
`
` 1326
`
`Output Unenooded
`Data Block with
`
`Output Encoded
`Data Block with
`
`
`
`
`
`
`Descriptor
`
`
`
`Null Descriptor
`
`Receive Next
`
`Data Block From
`
`Input Stream
`
`1 B30
`
`More Data
`Blocks in Input Data Compression
`
`Process
`
`
`Yes
`
`FIGURE 18C
`
`
`
`U.S. Patent
`
`Sep. 23, 2003
`
`Sheet 34 of 34
`
`US 6,624,761 B2
`
`Select Recognized
`Data TypeIFiIe or Black
`
`in List or Algorithm
`1340 --~--
`1842
`
`Saiect or Estimate
`
`Content Dependent
`Algorithms)
`
`Compress Data Black
`with Enabled Content
`
`Dependent Encodefls)
`
`Buffer Encoded Data
`
`Block Fmm Each
`Encoder
`
`Blocks
`
`Count Size of Data
`
`Compression Ratio
`
`Calculate
`
`
`Compression
`
`Ratio of at Least
`
`
`One Encoded Data
`Block Greater Than
`Content Dependent
`
`
`
`Nov F
`
`
`
`FIGURE 18D
`
`
`
`US 6,624,761 B2
`
`1
`CONTENT INDEPENDENT DATA
`COMPRESSION METHOD AND SYSTEM
`
`CR()SS-RI.iFERIiN(_‘li T0 R131./\Tl£D
`/'Ll’Pl..I(.‘A’l‘IONS
`
`This application is a Continuation-In-Part of U.S. patent
`application Ser. No. {J9.’?'{l5,446,
`is now a U.S. Pat. No.
`6,309,4% tiled on Nov. 3, 2000, which is a Continuation of
`U.S. patent application Ser. No. 09t21(},49t, filed on Dec.
`11, 1998, which is now U.S. Pat. No. 6,195, (I24, issued on
`Feb. 27, Ztlfll.
`
`BACKGROUND
`
`1. Technical Field
`
`The present invention relates generally to a data compres-
`sion and decompression and, more particularly, to systems
`and methods for data compression using content indepen-
`dent and content dependent data compression and decom-
`prtession.
`2. Description of Related Art
`|nl'om1ation may be represented in a variety of manners.
`Discrete information such as text and numbers are easily
`represented in digital data. This type of data representation
`is known as symbolic digital data. Symbolic digital data is
`thus an absolute representation of data such as a letter.
`figure, characte-r. mark, machine code, or drawing,
`Continuous information such as speech, music. audio.
`images and video, frequently exists in the natural world as
`analog information. As is well known to those skilled in the
`art, recent advances in very large scale integration (VLSI)
`digital computer technology have enabled both discrete and
`analog information to be represented with digital data.
`Continuous information represented as digital data is often
`referred to as dillttse data. Dilluse digital data is thus a
`representation of data that is of low informaLion density and
`is typically not easily recognizable to humans in its native
`form.
`
`There are many advantages associated with digital data
`representation. For instance, digital data is more readily
`processed, stored, and transmitted due to its inherently high
`noise immunity. In addition. the inclusion of redundancy in
`digital data representation enables error detection andfor
`correction. Error detection andtor correction capabilities are
`dependent upon the amount and type of data redundancy,
`available error detection and correction processing. and
`extent of data corruption.
`One outcome ofdigital data representation is the continu-
`ing need for increased capacity in data processing, storage,
`and transmittal. This is especially true for dilTuse data where
`increases in fidelity and resolution create exponentially
`greater quantities ot' data. Data compression is widely used
`to reduce the amount of data required to process, transmit.
`or store a given quantity of infonnation. In general, there are
`two types of data compression techniques that may be
`titilimd either separately or jointly to encode./decode data:
`lossless and lossy data compression.
`Lossy data compression techniques provide for an inexact
`representation of the original uncompressed data such that
`the decoded (or reconstructed} data differs from the original
`unencodedfuncompressed data.
`l.os5y data compression is
`also known as irreversible or noisy compression. Entropy is
`defined as the quantity of information in a given set of data.
`Thus, one obvious advantage of lossy data compression is
`that the compression ratios can be larger than the ent.ropy
`limit, all at the expense of information content. Many lossy
`
`l0
`
`I5
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`50
`
`65
`
`2
`
`data compression techniques seek to exploit various traits
`within the human senses to eliminate otherwise impercep-
`tible data. For example, lossy data compression of visual
`imagery might seek to delete information content in excess
`of the display resolution or contrast ratio.
`On the other hand, lc-ssless data compression teclmiques
`provide an exact representation of the original uncom-
`pressed data. Simply stated, the decoded (or reconstructed)
`data is identical to the original unencodedfuncompressed
`data. Lossless data compression is also known as reversible
`or noiseless compression. Thus, lossless data compression
`has, as its current limit, a minimum rept‘esentation defined
`by the entropy of a given data set.
`There are various problems associated with the ttse of
`lossless compression techniques. One fundamental problem
`encountered with most lossless data compression techniques
`are their content sensitive behavior. This is often refen'ed to
`as data dependency. Data dependency implies that the com-
`premion ratio achieved is highly contingent upon the content
`of the data being compressed. For example, database files
`often have large unused lields and high data redundancies,
`offering the opportunity to losslessly compress data at ratios
`of S to t or more. In contrast, concise software programs
`have little to no data redundancy and, typically, will not
`losslessly compress better than 2 to 1.
`Another problem with lossless compression is that there
`are significant variations in the compression ratio obtained
`when using a single lomless data compression technique for
`data streams having different data content and data sine. This
`process is known as natural variation.
`A further problem is that negative compression may occur
`when certain data compression techniques act upon many
`types of highly compressed data. llighly compressed data
`appears random and many data compression techniques will
`substantially expand, not compress this type of data.
`For a given application, there are many factors that govern
`the applicability of various data compression techniques.
`These factors include compression ratio, encoding and
`decoding processing requirements, encoding and decoding
`time delays, compatibility with existing standards, and
`implementation complexity and cost, along with the adapt-
`ability and robustness to variations in input data. /\ direct
`relationship exists in the current art between compression
`ratio and the amount and complexity of processing required.
`One of the limiting factors in most existing prior art lossless
`data compression techniques is the rate at which the encod-
`ing and decoding processes are performed. Ilardware and
`software implementation tradeoffs are often dictated by
`encoder and decoder complexity along with cost.
`Another problem associated with lossless compression
`methods is determining the optimal compression technique
`for a given set of input data and intended application. To
`combat this problem, there are many conventional content
`dependent techniques that may be utilized. For instance. file
`type descriptors are typically appended to file names to
`describe the application programs that normally act upon the
`data contained within the file. In this manner data types, data
`structures. and fonrtats within a given lite may be ascer-
`tained. Fundamental limitations with this content dependent
`technique include:
`(1) the extremely large number of application programs,
`some ofwhich do not possess published or documented
`file formats. data structures, or data type descriptors;
`(2)
`the ability for any data compression supplier or
`consortium to acquire, store, and access the vast
`amounts of data required to identify known file descrip-
`tors and associated data types, data structures, and
`formats; and
`
`
`
`3
`
`4
`
`US 6,624,761 B2
`
`|tJ
`
`I5
`
`(3) the rate at which new application programs are devel-
`oped and the need to update file fonnat data descrip-
`tions accordingly.
`An alternative technique that approaches the problem of
`selecting an appropriate lossless data compression technique
`is disclosed. for example, in U.S. Pat. No. 5,467,087 to Chu
`entitled “High Speed Lossless Data Compression System"
`(“C‘hu"). FIG.
`1
`illustrates an embodiment of this data
`compression and decompression technique. Data compres-
`sion 1 cc:-rnprises two phases, a data pre-compression phase
`2 and a data compression phase 3. l)ata decompression 4 of
`a compressed input data stream is also comprised of two
`phases, a data type retrieval phase 5 and a data decompres-
`sion phase 6. During the data compression process 1, the
`data prc-compressor 2 accepts an u.ncompressed data stream,
`identifies the data type of the input stream, and generates a
`data type identification signal. The data compressor 3 selects
`a data compression method from a preselected set of mettl-
`ods to compress the input data stream, with the intention of
`producing the best available compression ratio for that
`particular data type. There are several limitations associated
`with the Chu method. One such limitation is the need to
`unambiguously identify various data types. While these
`might include such common data types as ASCII, binary, or
`unicode, there, in fact, exists a broad universe of data types _
`that
`fall outside the three most common data types.
`Examples of these alternate data types include: signed and
`unsigned integers of various lengths. dilfering types and
`precision of floating point numbers, pointers. other tonne of
`character text, and a multitude of user defined data types.
`Additionally, data types may be interspersed or partially
`compressed, making data type recognition diflicult andfor
`impractical. Another limitation is that given a known data
`type, or mix of data types within a specific set or subset of
`input data, it may be tliflicult andior impractical to predict
`wltich data encoding technique yields the highest compres-
`sion ratio.
`there is a need for a data compression
`Accordingly,
`system and method that would address limitations in con-
`ventional data compression techniques as described above.
`
`30
`
`40
`
`SUM MARY OI? TI IE INVl£N'l‘lON
`
`The present invention is directed to systems and methods
`for providing fast and eilicient data compression using a
`combination of content independent data compression and
`content dependent data compression. In one aspect of the
`invention. a method for compressing data comprises the
`steps of:
`analyzing a data block of an input data stream to identify
`a data type of the data block, the input data stream
`comprising a plurality of disparate data types;
`performing content dependent data compression on the
`data block, if the data type of the data block is identi-
`fied;
`performing content independent data compression on the
`data block, if the data type. of the data block is not
`identilicd.
`
`In another aspect, the step of performing content inde-
`pendent data compression comprises: encoding the data
`block with it plurality of encoders to provide a plurality of
`encoded data blocks; determining a compression ratio
`obtained for each of the encoders; comparing each of the
`determined compression ratios with a first compression
`threshold; selecting for output
`the input data block and
`appending tl null compression descriptor to the inpt.It data
`block, if all of the encoder compression ratios do not meet
`
`45
`
`50
`
`55
`
`50
`
`65
`
`the first compression threshold; and selecting [or output the
`encoded data block having the higest compression ratio and
`appending a corresponding compression type descriptor to
`the selected encoded data block,
`if at
`least one of the
`compression ratios meet the llrst compression threshold.
`In another aspect, the step of performing content depen-
`dent compression comprises the stcps of: selecting one or
`more encoders associated with the identified data type and
`encoding the data block with the selected encoders to
`provide a plurality of encoded data blocks; determining a
`compression ratio obtained for each of the selected encod-
`ers; comparing each of the determined compression ratios
`with it second compression threshold; selecting for output
`the input data block and appending a null compression
`descriptor to the input data block.
`it‘ all of the encoder
`compression do not meet the second compression threshold;
`and selecting for output the encoded data block having the
`highest compression ratio and appending a corresponding
`compression type descriptor to the selected encoded data
`block, if at
`least one of the compression ratios meet the
`second compression threshold.
`In yet another aspect,
`the step of performing content
`independent data compression on the data block, it‘ the data
`type of the data block is not identified, comprises the steps
`of: estimating a desirability of using of one or more encoder
`types based one characteristics of the data block; and
`compressing the data block using one or more desirable
`encoders.
`In another aspect, the step of performing content depen-
`dent data compression on the data block, if the data type of
`the data block is identified, comprises the steps of: estimat-
`ing a desirability of using of one or more encoder types
`based on characteristics of the data block; and compressing
`the data block using one or more desirable encoders.
`In another aspect, the step of analyzing the data block
`comprises analyzing the data block to recognize one of a
`data type, data structure. data block fomiat, tile substructure.
`andlor file types. A further step comprises maintaining an
`association between encoder types and data types, data
`structures, data block formats. file substructure, andfor file
`types.
`In yet another aspect of the invention. a method for
`compressing data comprises the steps of:
`analyzing a data block of an input data stream to identify
`a data type of the data block, the input data stream
`comprising a plurality oi‘ disparate data types;
`performing content dependent data compression on the
`data block, if the data type of the data block is identi-
`fied;
`determining a compression ratio of the compressed data
`block obtained using the content dependent compres-
`sion and comparing the compression ratio with a first
`compression threshold; and
`performing content independent data compression on the
`data block, if the data type of the data block is not
`identified or if the compression ratio of the compressed
`data block obtained using the content dependent com-
`pression docs not meet the iirst compremzion threshold.
`Advantagenusly, the present invention employs a plurality
`of encoders applying a plurality of compression techniques
`on an input data stream so as to achieve maximum com-
`pression in accordance with the real-time or pseudo reat-
`timc data rate constraint. Thus. the output bit rate is not fixed
`and the amount, if any, of permissible data quality degra-
`dation is user or data spe