`Fallon
`
`(10) Patent NIL:
`(45) Date of Patent:
`
`US 6,195,024 B1
`Feb. 27, 2001
`
`US0U6l95U'24l-Bl
`
`(54)
`
`(.'0NTl*ZNT INDEPENIJENT DATA
`COMPRESSION MICTHOD AND SYSTEM
`
`(74) Attorney, Agent, or Firm—Franl< V. Dellosa; F. Chau
`& Associates, LLP
`
`(75)
`
`Inventor:
`
`James J. Fallon, Bronxvilte, NY (US)
`
`(57)
`
`AHSTR.-\C'l‘
`
`(73) Assignee: Realtitne Data. LLC. New York, NY
`(US)
`
`(
`
`i‘
`
`) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. t54(|J) by (1 days.
`
`(21) App]. No: lI9;’2'l0.49l
`
`(22
`
`Filed:
`
`Dec. 11, 1998
`
`Int. Cl.‘ ............................ .. HOJM 7134; HUBM "H00
`(51)
`l1.S. Cl.
`34l!5l; 34lt79
`(52)
`(58) Field of Search ................................ .. 34151. 79. 67;
`709.1231, 219, 236, 250; 358t1.1; 712.:-'32;
`7llt.':‘U8
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`-'l.B?2_.tJUU
`4,92‘)_,9~lt‘:t
`5_.D45_.Ft52
`5.097.261
`5,175,543 “
`
`.
`
`[(131989 'l‘su.kiyan1a et al.
`5,u"l'§I‘)tJ O Brien et al.
`.
`Wl'3'9l Mitchell el al.
`.
`3ttE-'92 I_.angdon_. Jr. et al. .
`l2,»’l'§|92
`l.a11tz
`
`(|..is1 continued on next page.)
`
`341.351
`
`independent
`Systems and methods for providing content
`lossless. data com prcssiort and decompression. A data com-
`pression system includes a plurality of encoders that are
`configured to simultaneously or sequentially compress data
`independent of the data content. The results of the various
`encoders are compared to detennine if compression is
`achieved and to determine which encoder yields the highest
`lossless contpressiou ratio. The encoded data with the high-
`est lossless com pressioo ratio is then selected for subsequent
`data processing, storage, or lransmittai. A compression iden-
`tification descriptor may be appended to the encoded data
`with the highest compression ratio to enable subsequent
`decompression and data interpretation. Furthermore, a timer
`may be added to measure the time elapsed during the
`encoding process against an a priori-specilied time limit.
`When the time limit expires, only the data output from those
`encoders that have completed the encoding, process are
`compared. The encoded data with the highest compression
`ratio is selected for data processing, storage, or transmittal.
`The im posed time limit ensures that the real-time or pseudo
`real-tirnt: nature of the data encoding is preserved. Bultering
`the output from each encoder allows additional encoders to
`he sequentially applied to the output of the previous encoder,
`yielding a more optimal lossless data compression ratio.
`
`t"rt'mm'_t-' E.rrt:tttTm:r~Patriclc Wnmsley
`
`34 Claims, 16 Drawing Sheets
`
`BUFFER!
` DATA STREAM ENCODER E1
`
`COUNTER ‘I
`
`
`ENCODED DA TA
`
`BUFFER!
`STREAM We’
`ENCOD ER E2
`DESCRFPTOR
`COUNTER 2
`
`
`INPUT
`BUFFEW
`DATA
`
`ENCODER E3
`BUFFER
`COUNTER 3
`
` IUI
`
`°°M;':$lS6S'°”
`COMPRESSION
`
`TYPE
`
`”EgEhTg;:’I;T{;?v”’
`DESCRIPTION
`
`
`
`DATA
`
`BLOCK
`COUNTER
`
`
` 10
`
`
`
`
`
`ENCOD ER En
`
`
`BUFFER!
`COUNTER rt
`
`
`
`
`40
`
`Oracle 1025
`
`Oracle 1025
`
`
`
`US 6,195,024 B1
`Page .2
`
`.
`jdlnq
`
`5,717,393
`5,'?17_.394
`5,729,228
`5,748_.FN]4
`5,771,340
`5,784,572
`5,799,110
`5,305,932
`5,309,173
`5,818,368
`5,818,530
`5,319,215
`5,325,«'124
`5,847,762
`5,351,324
`5,917,438
`5,96-{.842
`5,991,515
`15,031,939
`
`271993
`271993
`371998
`511993
`571993
`'1"7’l 998
`371993
`971993
`971993
`1071993
`10711998
`1071993
`1071993
`1231998
`1719.99
`61199‘-"J
`1DN9‘.‘-‘9
`1171999
`272000
`
`* ciled by cxatniner
`
`.
`
`.
`
`.
`
`.
`
`_
`Nakano cl :1].
`Schwartz e1 31.
`Frauaszek e1 .11.
`Huang 31 al.
`.
`Nalwzalo cl al.
`.
`Rnslnker El.
`:1].
`[stat-Isen el al. .
`Kawa-shin1a cl :1].
`Yajimu .
`Langley .
`Canfield 1:-l al.
`Dobson el al.
`Canfield e1a|.
`Canlield 1:1 31.
`Ryu 91 al. .
`Ando .
`Packard .
`.
`Fall el al.
`Gi"J<.“l‘I er al.
`
`.
`
`.
`
`.
`.
`
`.
`
`.
`
`.
`
`U .5. PA'l'EN'l' I)OCUMLiN'I'S
`Normilc cl al. .
`Dungi ct 31.
`.
`Seruussi el 3]. .
`Jackson .
`Balkmmki at al.
`Slorur .
`.
`Allen el :1].
`Kulakuwski cl 3]. .
`Chang el al.
`N0-rrnile e1 :11.
`C-hu .
`.
`Allen at al.
`Canmbell at al.
`Remillard .
`J1:-Chang 31 al. _
`James .
`Allen et all. _
`Craft .
`clam, 11 .
`Her .
`
`"‘
`
`5,212,742
`531,492
`5,243.34 1
`5,243,343
`5,270,332
`5_.379,03.5
`5,331 _. 145
`5,394,534
`5.413334
`5,.461,r:79
`5,467,037
`5_.471_.206
`5,479,587
`5,436,326
`5_.495_.2-14
`5,533.051
`5,533,500
`5,327,534
`5,654,703
`5,653,737
`
`571993
`"H1993
`97'l993
`97'l993
`l27’1993
`H1995
`17’l995
`271995
`57’l995
`1071995
`1171995
`1171995
`1271995
`171996
`271996
`771996
`1271996
`571997
`371997
`971997
`
`
`
`U.S. Patent
`
`Feb. 27, 2001
`
`Sheet 1 of 16
`
`US 6,195,024 B1
`
`l'—jj‘?'jji:":j::j
`
`INPUT DATA STREAM
`
`| I I I |
`
`I I I I
`
`l—
`
`IDENTIFY INPUT DATA TYPE AND
`GENERATE DATA TYPE IDENTIFICATION
`
`2
`
`SIGNAL
`
`
`DATA TYPE
`ID SIGNAL
`COMPRESS DATA IN ACCORDANCE WITH
`IDENTIFIED DATA TYPE
`
`1
`
`I
`[J
`I
`
`—.—-n.—..
`
`3
`
`COMPRESSED DATA STREAM
`
`RETRIEVE DATA TYPE
`INFORMASLOFIX%_I;'_é‘.é)Ah;\IflPRESSED
`
`WITH IDENTIFIED DATA TYPE
`
`DECOMPRESS DATA IN ACCORDANCE
`
`I I I
`
`|
`
`I I I I I
`
`FIG. 1
`
`PRIOR ART
`
`
`
`U.S. Patent
`
`Feb. 27, 200:
`
`Sheet 2 M16
`
`US 6,195,024 B1
`
`
`
`TEKQQMQOQZM
`
`S.s_<m.Em.
`
`mokcmommo
`
`2o_mmm_mn_s_oo
`
`m_n_>._.
`
`zo_mwmE_2oo
`
`O_._.(m
`
`zo_EEomm_o
`..zo_§..z_s_mmEn_
`
`
`
`zow_mE$.00
`
`Emtam
`
`_.ymfiznoo
`
`Emninm
`
`Nmm_._.z:oo
`
`Emfism
`
`m...u_m._.ZDO0
`
`
`
`_.m_mmaoozm
`
`mmmmnoozm.
`
`
`
`mm.mmaoozm
`
`Emtam
`
`:mmpzaoo
`
`cmmmaoozm
`
`3
`
`3
`
`N.O_u_
`
`._.Dn_z_
`
`«.55
`
`mmzmom
`
`<»<n_
`
`v_oo._m
`
`xm._.z:oo
`
`3qm_.Em3..TO
`
`
`
`
`
`U.S. Patent
`
`Feb. 27, 2011:
`
`Sheet 3 of 16
`
`US 6,195,024 B1
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`300
`
`B
`
`COUNT SIZE OF
`DATA BLOCK
`
`302
`
`BUFFER DATA BLOCK
`
`304
`
`COMPRESS DATA
`BLOCK WITH
`ENABLED ENCODERS
`
`BUFFER ENCODED
`DATA BLOCK OUTPUT
`FROM EACH
`ENCODER
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`COMPARE
`COMPRESSION
`RATIOS WITH
`THRESHOLD LIMIT
`
`FIG. 3a
`
`303
`
`310
`
`312
`
`314
`
`
`
`U.S. Patent
`
`Feb. 27, 2011!
`
`Sheet. 4 of 16
`
`US 6,195,024 B1
`
`316
`
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`ENCODED DATA BLOCK
`
`GREATER THAN
`THRESHOLD?
`
`NO
`
`
`
`APPEND NULL
`DESCRIPTOR TO
`UNENCODED INPUT
`DATA BLOCK
`
`
`
`
`
`SELECT ENCODED
`DATA BLOCK WITH
`GREATEST
`COMPRESSION RATIO
`
`
`
`318
`
`320
`
`
`
`APPEND
`CORRESPONDING
`
`DESCRIPTOR
`
`
`
`
`
`
`
`OUTPUT UNENCODED
`DATA BLOCK WITH
`NULL DESCRIPTOR
`
`
`
`OUTPUT ENCOIJED
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`
`MORE
`DATA BLOCKS IN INPUT
`STREAM’?
`
`
`
`
`
`TERMINATE DATA
`COMPRESSION
`
`PROCESS
`
`
`
`
`
`330
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`
`
`
`FIG. 3b
`
`
`
`U.S. Patent
`
`Feb. 27, 2001
`
`Sheet 5 of 16
`
`US 6,195,024 B1
`
`u_OMMDOE
`
`_._mm_s_
`
`ZO_._.<Z__2Km._.mD
`
`mmooozw
`
`>:.__m<~__mmn_
`
`mmopoi
`
`w.O_u_
`
`Emrrfim
`
`:mm_._.23oo
`
`cmmmooozm
`
`moifiummoNm_m_._.z:ooIS.§_._m.Ew
`Emtsmmmmmoouzm
`
`zom_m<n_zoo.
`
`
`zo_mmmmn_s_oozo_mwm_mn__2oo._.Dn_Z_E3
`
`zo_E_mowmommmfizzoommuzammmeznoo
`
`
`
`mac,»~zo_._.._m.u.._,_.__..&mm_._.m_n_EMLLDMmmmmmoozm<55xoo._m
`
`
`
`
`
`.3....10.QMQOozm
`
`_‘mm»:300
`
`.&mu_u_Dm_Emmooozm
`
`3...¢.m.:..t..m.3..{Q
`
`
`
`
`
`U.S. Patent
`
`Feb. 27, 2011:
`
`Sheet. 5 of us
`
`US 6,195,024 B1
`
`RECEIVE INITIAL
`
`DATA BLOCK FROM INPUT DATA STREAM
`
`B
`
`COUNT SIZE OF
`
`DATA BLOCK
`
`' BUFFER DATA BLOCK
`
`COMPRESS DATA
`
`BLOCK WITH ENABLED ENCODERS
`
`500
`
`502
`
`504
`
`505
`
`508
`
`510
`
`
`
`APPEND CORRESPONDING
`DESIRABILITY FACTORS T0
`ENCODED DATA BLOCKS
`
`
`
`BUFFER ENCODED DATA
`BLOCK OUTPUT
`
`FROM EACH ENCODER
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`512
`
`514
`
`COMPARE COMPRESSION
`RATIOS WITH THRESHOLD
`
`LIMIT
`
`515
`
`FIG. 5a
`
`
`
`U.S. Patent
`
`Feb. 27, 200:
`
`Sheet. 7 M’ 16
`
`US 6,195,024 B1
`
`A
`
`518
`IS
`
`
`COMPRESSION
`
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`THRESHOLD?
`
`
`
`520
`
`APPEND NULL
`CALCULATE FIGURE OF
`DESCRIPTOR TO
`MERIT FOR EACH ENCODED
`UNENCODED INPUT
`DATA BLOCK WHICH EXCEED
`DATA BLOCK
`THRESHOLD
`
`
`
`
`SELECT ENCODED DATA
`BLOCK WITH GREATEST
`FIGURE OF MERIT
`
`OUTPUT UNENCODED
`DATA BLOCK WITH
`NULL DESCRIPTOR
`
`
`
`APPEND
`CORRESPONDING
`DESCRIPTOR
`
`
`
`
`
`
`
`
`
`
` OUTPUT ENCODED
`
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`
`
`MORE
`DATA BLOCKS IN
`INPUT STREAM?
`
`PROCESS
`
`TERMINATE DATA
`COMPRESSION
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`
`STREAM
`
`534
`
`B
`
`FIG. 5b
`
`
`
`
`
`U.S.Patent
`
`«CEOQmaoozm
`
`Emntnm
`
`Vmmfiznoo
`
`Emmaoozw
`
`s_<mEm3..<0
`
`Feb.27,2001
`Sheet.3of16
`mobimommaNmmfiznoo
`\\SsimgmEmn_“_:mmmw_m_n_OOZm_
`
`zo_mmm_xn_2oozo_mwmmn_s_oo5n_z_<59
`
`zo_E_mommommmfizsoommzmammmfizaoo
`
`
`
`
`m_n_>._..__ZO_._..v~u.~%._.__.—.%...,%_u_m_._..mwnu...N_m_u_n_DmmmENDOOZM(H40v_oo._m
`
`zow_m§2oo.
`
`Emtnm
`
`:mmtzsoo
`
`cmmmooozm
`
`US6,195,024B1©.O_n_
`
`8
`
`-mmm.:
`
`
`
`m=\<.......nm...u...om_n__m
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 27, 200:
`
`Sheet 9 nI'16
`
`US 6,195,024 B1
`
`INPUT INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`
`
`710
`
`TIME EXPIRED?
`
`
`
`COUNT SIZE OF
`DATA BLOCK
`
`"0
`
`No
`
`
`
`ENCODING
`
`COMPLETE?
`
`
`
`BUFFER DATA BLOCK
`
`INITIALIZETIMER
`
`STOP
`ENCODING
`PROCESS
`
`718
`BUFFER ENCODED
`
`BEGIN
`BUFFER
`DATA BLOCK FOR EACH
`
`COMPRESSING
`ENCODER THAT
`
`DATA BLOCK WITH
`FROM EACH
`COMPLETED ENCODING
`
`ENCODERS
`ENCODER
`PROCESS
`WITHIN TIME LIMIT
`
`
`
`724
`
`
`
`COMPARE COMPRESSION
`RATIOS WITH THRESHOLD
`LIMIT
`
`
`
`
`
`FIG. 7a
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`700
`
`702
`
`B
`
`704
`
`705
`
`.
`
`“)8
`
`
`
`U.S. Patent
`
`Feh.27,2l]1Il
`
`Sheet 10 of 16
`
`US 6,195,024 B1
`
`726
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`ENCODED DATA BLOCK
`
`GREATER THAN
`THRESHOLD?
`
`
`
`728
`
`730
`
`
`
`
`SELECT ENCODED
`APPEND NULL
`DATA BLOCK WITH
`DESCRIPTOR TO
`
`GREATEST
`UNENCODED INPUT
`
`COMPRESSION RATIO
`DATA BLOCK
`
`
`
`
`
`APPEND
`CORRESPONDING
`DESCRIPTOR
`
`
`
`OUTPUT UNENCODED
`
` OUTPUT ENCODED
`DATA BLOCK WITH
`DATA BLOCK WITH
`NULL DESCRIPTOR
`DESCRIPTOR
`
`
`
`
`
`
`
`MORE
`
`DATA BLOCKS IN INPUT
`
`STREAM?
`
`
`STREAM
`
`TERMINATE DATA
`
`
`COMPRESSION
`
`PROCESS
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`
`740
`
`FIG. 7b
`
`
`
`HetaP&U
`
`e..HS
`
`m
`
`(DS
`
`0/1’
`
`1BM.
`
`U:$5zooaEmumam
`
`cmmmooozm
`
`
`
`m...EEqmmhmEm_u_n_:mE«Qomnoozm
`
`.mmofirmommnNmmkzsoo
`
`zo_wwm.mn_2oo2.mnrfio_._.<mEm_u_n_:mM2o_mmmEs_oo
`
`
`I.zomzéizooWzo_E_mowm_n__,zo_.Ez_sEm_.m_ommmfizzoo
`
`
`
`t.Emtam
`
`rwmfizsou
`
`Vmmmaoozm
`
`mmmmooozm
`
`mmmmaoozm
`
`+:a2_
`
`¢h<D
`
`mmmuam
`
`<H<DHD&z_
`
`zoo4m
`
`xmommmooma
`
`EMHZDOQ
`
`§<mm»m¢u<a
`
`Uzo_Ez_s_mm_m_a
`
`momanor.
`
`:mms_
`
`M,m0_u_
`
`MMDOOZM
`
`>F3E<I5mD
`
`mmoeo<¢
`
`EMEF
`
`om
`
`-mmm:
`
`
`
`ms:omiommm
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 27, 2001
`
`Sheet 12 of 16
`
`US 6,195,024 B1
`
`
`
`
`
`.......__..«R...QMQOUZM
`
`$3zqmmhm
`
`moifimomma
`
`m.O_u_
`
`zo_mmm_m_n_s_oo
`
`OC.<m_
`
`zo_Ez_2mmEo
`
`,.
`
`zom_m<%._oo
`
`n_Omanor.
`
`:mm__2
`
`zo_E2_sEmEn_
`
`8
`
`zo_mwmmm_2oo
`
`mi?
`
`zo_E_mowmn_
`
`c.EOhm
`N.EOE
`
`_...E05
`
`gemNewhem
`
`cgoa:.
`N.—Qm
`
`_.__.Qm
`
`:;m«rmram
`
`
`
`Eqmmhm.cRSu
`
`c.Nom.
`maO...m
`
`_..N05
`
`cum«mmFam
`
`‘
`
`caoa:.
`NaU_..m_
`
`2.,oa
`
`9mm«mmtam
`
`._.3n_Z_
`
`<._.<a
`
`wmrrsm
`
`<55
`
`zoo._m
`
`mmkzaoo
`
`OE»
`
`mmooozm
`
`Er_.__m_<m_mmn_
`
`mmopofi
`
`E2:
`
`8
`
`-_...m.m:
`
`
`
`m.3........nm....u.__omo_.m
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 27, 2001
`
`Sheet 13 of I15
`
`US 6,195,024 B1
`
`100
`
`‘I02
`
`‘I04
`
`106
`
`‘I08
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`COUNT SIZE OF
`
`DATA BLOCK
`ENCODER PATHS
`
`BUFFER DATA
`BLOCK
`
`INITIALIZE TIMER
`
`APPLY INPUT DATA
`BLOCK TO FIRST
`ENCODING STAGE
`IN CASCADED
`
`,11o
`
`TIME EXPIRED?
`
`
`
`
`116
`
`NO
`
`COMPLETE?
`
`
`
`
`APPLY OUTPUT
`
`
`OF COMPLETED
`ENCODING
`
`STAGE TO NEXT
`ENCODING
`
`
`STAGE IN
`
`
`CASCADE PATH
`
`
`
`BUFFER
`ENCODED DATA
`BLOCK OUTPUT
`FROM
`COMPLETED
`ENCODING
`STAGE
`
`STOP ENCODING
`PROCESS
`
`
`
`SELECT BUFFERED OUTPUT OF LAST
`ENCODING STAGE IN ENCODER
`CASCADE THAT COMPLETED ENCODING
`PROCESS WITHIN TIME LIMIT
`
`
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`COMPARE COMPRESSION
`RATIOS WITH THRESHOLD
`LIMIT
`
`FIG. 108
`
`A
`
`
`
`U.S. Patent
`
`Feb. 27, 2001
`
`Sheet 14 of In
`
`US 6,195,024 B1
`
`128
`IS
`
`
`COMPRESSION
`
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`THRESHOLD?
`
`
`
`
`
`134
`
`
`CALCULATE FIGURE OF
`APPEND NULL
`MERIT FOR EACH ENCODED
`DESCRIPTOR TO
`
`
`DATA BLOCK WHICH EXCEED
`UNENCODED INPUT
`THRESHOLD
`DATA BLOCK
`
`
`
`.130
`
`
`
` SELECT ENCODED DATA
`
`BLOCK WITH GREATEST
`FIGURE OF MERIT
`
`136
`
`138
`
`APPEND
`CORRESPONDING
`DESCRIPTOR
`
`OUTPUT UNENCODED
`DATA BLOCK WITH
`NULL DESCRIPTOR
`
`132
`
`
`
`140
`
` OUTPUT ENCODED
`
`DATA BLOCK WITH
`DESCRIPTOR
`
`142
`
`
`
`W
`
`MORE
`DATA BLOCKS IN
`
`INPUT STREAM?
`
`
`YES
`
`STREAM
`
`ERMINATE DATA
`COMPRESSION
`PROCESS
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`
`144
`
`3 FIG. 10b
`
`
`
`..fl
`
`H
`
`m
`
`(DSU
`
`91|.
`
`1BM.
`
`M:.9:
`
`.m.5mmooumo
`
`3:
`
`S.
`
`m8:8:.8:
`
`5U
`
`tne4|...
`
`2.m
`
`DammoLE..mom.ma
`..322§3:3.
`
`RjEmmaoomo
`
`MEso3350M.Smmaooma
`§<mEm<2959.508mmooomom_o.E_momm_o<20._.Dn_z_s;_m.EmE«Q
`
`
`
`
`
`
`
`
`
`mmnmamzo_5,qExmmmumnmxooam
`
`
`
`
`
`
`
`
`U.S. Patent
`
`Feb. 27, 200:
`
`Sheet 16 of is
`
`US 6,195,024 B1
`
`1200
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`‘I 202 BUFFER DATA BLOCK
`
`1204
`
` EXTRACT DATA
`COMPRESSION TYPE
`
`DESCRIPTOR
`
`
`
`
`‘I206
`
`IS DATA
`COMPRESSION
`TYPE DESCRIPTO
`NULL?
`
`
`
`
`
`
`SELECT DECODER(S)
`CORRESPONDING TO
`
`DESCRIPTOR
`
`
`
`‘I203
`
`
`
`UNDECODED
`DATA BLOCK
`
`
`
`DECODE DATA BLOCK USING
`
`SELECTED DECODER(S)
`
`OUTPUT DECODED
`DATA BLOCK
`
`
`
`RECEIVE NEXT
`DATA BLOCK IN
`INPUT STREAM
`
`
`
`
`
`MORE DATA
`
`BLOCKS IN INPUT
`
`STREAM?
`
`
`NO
`
`TERMINATE
`DECODING PROCESS
`
`1218
`
`FIG. 12
`
`
`
`US 6,195,024 B1
`
`1
`CONTENT INDEPENDENT DATA
`COMPRESSION METHOD AND SYSTEM
`
`BACKG ROUND
`
`t. Technical Field
`
`ID
`
`I5
`
`The present invention relates generally to data compres-
`sion and decompression and, more particularly, to systems
`and methods for providing content independent lossless data
`compression and decompression.
`2. Description of the Related Art
`Int'on'n.'-ttion may be represented in a variety of manners.
`Discrete information such as text and numbers are easily
`represented in digital data. This type of data representation
`is known as symbolic digital data. Symbolic digital data is
`thus an absolute representation of data such as a letter,
`figure, character, mark, machine code, or drawing.
`Continuous information such as speech. music. audio.
`images and video, frequently exists in the natural world as
`analog infonnation. As is well-known to those skilled in the
`art, recent advances in very large scale integration (VLSI)
`digital computer technology have enabled both discrete and
`analog information to be represented with digital data.
`Continuous information represented as digital data is often
`referred to as dilILIse data. Dilluse digital data is thus a '
`representation of data that is of low information density and
`is typically not easily recognizable to humans in its native
`form.
`
`There are many advantages associated with digital data
`representation. For instance, digital data is more readily
`processed, stored, and transmitted due to its inherently high
`noise immunity. In addition, the inclusion of redundancy in
`digital data representation enables error detection andior
`correction. Error detection andfor correction capabilities are
`dependent upon the amount and type of data redundancy,
`available error detection and correction processing, and
`extent of data corruption.
`One outcome ofdigital data representation is the continu-
`ing need for increased capacity in data processing, storage.
`and transmittal. This is especially true for dilfuse data where
`increases in fidelity and resolution create exponentially
`greater quantities of data. Data compression is widely used
`to reduce the amount of data required to process, transmit,
`or store a given quantity of information. In general, there are
`lwo types. of data compression techniques that may be
`tttilized either separately or jointly to encodefdecode data:
`lossless and lossy data compression.
`Lossy data compression techniques provide for an inexact
`representation of the original uncompressed data such that
`the decoded (or reconstructed} data differs from the original
`unencodedtuncompressed data. Lossy data compression is
`also known as irreversible or noisy compression. Entropy is
`defined as the quantity of information in a given set of data.
`Thus, one obvious advantage of lossy data compression is
`that the compression ratios can be larger than the entropy
`limit, all at the expense of information content. Many lossy
`data compression techniques seek to exploit various traits
`within the human senses to eliminate otherwise impercep-
`tible data. lior e.\ta.I‘rIple, lnssy data compression of visual
`imagery might seek to delete information content in excess
`of the display resolution or contrast ratio.
`On the other hand, lossless data compression techniques
`provide an exact representation of the original uncom-
`pressed data. Simply stated, the decoded (or reconstructed)
`data is identical
`to the original unencodediuncornpressed
`data. Lossless data compression is also known as reversible
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`on
`
`65
`
`2
`
`or noiseless compression. 'lhtLs_. lossless data compression
`has, as its current limit, a minimum representation defined
`by the entropy of a given data set.
`There are various problems associated with the use of
`lossless compression techniques. One fundamental problem
`encountered with most lossless data compression techniques
`are their content sensitive behavior. This is often referred to
`as data dependency. Data dependency irnpties that the com-
`pression ratio achieved is highly contingent upon the content
`of the data being compressed. For example, database lites
`often have large unused fields and high data redundancies,
`offering the opportunity to losslessly compress data at ratios
`of S to 1 or more. In contrast. concise software programs
`have little to no data redundancy and, typically, will not
`losslessly compress better than 1?. to 1.
`Another problem with lossless compression is that there
`are significant variations in the compression ratio obtained
`when using a single lossless data compression technique for
`data streams having ditierent data content and data size. This
`process is lmown as natural variation.
`A further prohlern is that negative compression may occur
`when certain data compression techniques act upon many
`types of highly compressed data. Highly compressed data
`appears random and many data compression techniques will
`substantially expand. not compress this type of data.
`For a given application,
`there are many factors which
`govern the applicability of various data compression tech-
`niques. These factors include compression ratio, encoding
`and decoding processing requirements, encoding and decod-
`ing time delays, compatibility with existing standards, and
`implementation complexity and cost, along with the adapt-
`ability and robustness to variations in input data. /\ direct
`relationship exists in the current art between compression
`ratio and the amount and complexity of processing required.
`One of the limiting factors in rrlost existing prior art lossless
`data compression techniques is the rate at which the encod-
`ing and decoding processes are performed. Hardware and
`software implementation tradeotfs are often dictated by
`encoder and decoder com plcxity along with cost.
`Another problem associated with lossless compression
`methods is detennining the optimal compression technique
`for a given set of input data and intended application. To
`combat this problem, there are many conventional content
`dependent techniques which may be utilized. For instan
`ftletype descriptors are typically appended to file names to
`clescrihe the application programs that normally act upon the
`data contained within the file. In this manner data types, data
`structures, and formats within at given tile may be ascer-
`tained. Fundamcntal problems with this content dependent
`technique are:
`(1) the extremely large number of application programs.
`some of which do not possess published or documented
`file formats, data structures, or data type descriptors;
`( 2)
`the ability for any data compression supplier or
`consortium to acquire, store, and access the vast
`amounts of data required to identify known tile descrip-
`tors and associated data types, data structures, and
`formats; and
`(3) the rate at which new application programs are devel-
`oped and the need to update file format data descrip-
`tions accordingly.
`An alternative technique that approaches the problem of
`selecting an appropriate lossless data compression technique
`is disclosed in US. Pat. No. 5,467,(l87 to Chu entitled “High
`Speed lossless Data Compression System" (“C‘hu"). FIG. 1
`illustrates an embodiment of this data compression and
`
`
`
`3
`
`4
`
`US 6,195,024 B1
`
`decompression technique. Data compression 1 comprises
`two phases, a data pre-compression pham 2 and a data
`compression phase 3. Data decompression 4 of a com»
`pressed input data stream is also comprised of two phases,
`ti data type retrieval phase 5 and a data decompression phase
`6. During the data compression process 1,
`the data pre-
`compressor 2 accepts an uncompressed data stream, identi-
`fies tl1e data type of the input stream, and generates a data
`type identification signal. The data compressor 3 selects a
`data compression method from a pres.elected set of methods
`to compress the input data stream. with the intention of
`producing the best available compression ratio for that
`particular data type.
`There are several problems associated with the (Thu
`method. One such problem is the need to unambiguously
`identify various data types. While these might include such
`common data types as ASCII. binary, or unicodc. there, in
`tact, exists a broad universe of data types that fall outside the
`three most common data types. Examples of these alternate
`data types include: signed and unsigned integers of various
`lengths, difiering types and precision of floating point
`numbers. pointers, other forms of character text. and a
`multitude ol'Ltser dellrted data types. Additionally, data types
`may be interspersed or partially compressed, making data
`type recognition ditlicult andfor in1prac|_ica.l. Another prob-
`lem is that given a known data type, or mix of data types
`within a specific set or subset of input data,
`it may be
`dillicttlt andfor impractical to predict which data encoding
`technique yields the highest compression ratio.
`Chu discloses an alternate embodiment wherein a data
`
`compression rate control signal is provided to adjust specific
`parameters of the selected encoding algorithm to adjust the
`compression time lior compressing data. One problem with
`this technique is that the length of time to compress a given
`set of input data may be difficull or impractical to predict.
`Consequently, there is no guarantee that a given encoding
`algorithm or set of encoding algorithms will perform for all
`possible combinations of input data for id specific timing
`constraint. Another problem is that, by altering the param-
`eters of the encoding process,
`it may be difficult andfor
`impractical to predict the resultant compression ratio.
`Other conventional techniques have been implemented to
`address the aforementioned problems. For instance, US.
`Pat. No. 5,243,341 to Seroussi et :11. describes a class of
`Lcmpel—Ziv lossless data compression algorithms that uti-
`lize a memory based dictionary of finite size to facilitate the
`compression and decompression of data. A second standby
`dictionary is included comprised of those encoded data
`entries that compress the greatest amount of input data.
`When the current dictionary fills up and is reset. the standby
`dictionary becomes the current dictionary, thereby maintain-
`ing a reasonable data compression ratio and freeing up
`memory for newly encoded data strings. Multiple dictionar-
`ies are employed within the same encoding technique to
`increase the lossless data compression ratio. This technique
`demonstrates the prior art of using multiple dictionaries
`within a single encoding process to aid in reducing the data
`dependency of a single encoding technique. One problem
`with this method is that it does not address the difficulties in
`dealing with a wide variety of data types.
`teaches a
`US. Pat. No. 5,717,393 to Nakano, et at.
`plurality of code tables such as a high-usage code table and
`a
`low—usage code table in an entropy encoding unit. A
`block—sorted last character string from a block—sorting trans—
`forming unit is the move—to—front transfonning unit is trans-
`formed into a move-to-front (MTF) code string. The entropy
`encoding unit switches. the code tables at a discontinuous
`
`l0
`
`I5
`
`-
`
`30
`
`.
`
`40
`
`45
`
`Sf)
`
`55
`
`bf]
`
`05
`
`part of the MTI’ code string to perform entropy coding. This
`technique increases the compression rate without extending
`the block size. Nakano employs multiple code tables within
`a single entropy encoding unit to increase the lossless data
`compression ratio for a given block size, somewhat reducing
`the data dependency of the encoding algorithm. Again, the
`problem with this technique is that it does not address the
`dilliculties in dealing with a wide variety of data types.
`US. Pat. No. 5,809,] 76 to Yajima discloses a technique of
`dividing :1 native or uncompressed image data into a plu-
`rality of streams for subsequent encoding by a plurality of
`identically functioning arithmetic encoders. This method
`demonstrates the technique of employing multiple encoders
`to reduce the time of encoding tor a single method of
`compression.
`U.S. Pat. Nos. 5,583,500 and 5,471,206 to Allen, at al.
`disclose systems for parallel decompression of a data stream
`comprised of multiple code words. At least two code words
`are decoded simultaneously to enhance the decoding pro-
`cess. This technique demonstrates the prior art of utilizing
`multiple decoders to expedite the data decompression pro»
`cess.
`
`teaches a two-stage
`U.S. Pat. No. 5,635,534 to Craft
`lossless compression process. A run length precompressed
`output is post processed by a Lempel—Ziv dictionary sliding
`window dictionary encode-r that outputs a succession of
`fixed length data units. This yields a relatively high-speed
`compression technique that provides a good match between
`the capabilities and idiosyncrasies ol' the two encoding
`techniques. This technique demonstrates the prior art of
`employing sequential lossless encoders to increase the data
`compression ratio.
`U.S. Pat. No. 5,';'99,l1U to lsraelsen. et al. discloses an
`adaptive threshold technique for achieving a constant bit rate
`on a hierarchical adaptive multistage vector quantization. A
`single compression technique is applied iteratively until the
`residual
`is reduced below a prespecified threshold. The
`threshold may be adapted to provide a constant bit rate
`output. If the nth stage is reached without the residual being
`less than the threshold, a smaller input vector is selected.
`U.S. Pat. No. 5,81 9,.".’..l S to Dobson, et at. teaches a method
`of applying either loss}: or lossless compression to achieve
`a desired subjective level of quality to the reconstructed
`signal.
`In certain embodiments this technique utilizes a
`combination of run—length and I-Iutfman encoding to take
`advantage of other local and global statistics. The tradeolls
`considered in the compression process are perceptible dis-
`tortion errors versus a fixed bit rate output.
`SUMMARY OF TIIL7 INVENTION
`
`The present invention is directed to systems and methods
`for providing content independent lossless data compression
`and decompression. In one aspect of the present invention,
`a method for providing content independent lossless data
`compression comprises the steps of:
`(a) receiving as input a block of data from a stream of
`data, the data stream comprising one of at least one data
`block and a plurality of data blocks:
`(b) counting the size of the input data block;
`(C) encoding the input data block with a plurality of
`lossless encoders to provide a plurality ofencoded data
`blocks;
`of each of the encoded data blocks;
`(cl) counting the
`(e) determining a lossless data compression ratio obtained
`for each of the encoders by taking the ratio of the size
`of the encoded data block output from the encoders to
`the size of the input data block;
`
`
`
`5
`
`6
`
`US 6,195,024 B1
`
`(I) comparing each of the determined compression ratios
`with an a priori user specified compression threshold;
`g) selecting for output the input data block and append-
`ing a null data type compression descriptor to the input
`data block. if all of the encoder compression ratios fall
`below the a priori specified compression threshold; and
`(h) selecting for output the encoded data block having the
`highest compression ratio and appending it correspond-
`ing data type compressirin descriptor to the selected
`encoded data block, it‘ at least one of the compression
`ratios exceed the a priori specified compression thresh-
`old.
`
`l0
`
`l5
`
`2!]
`
`timer is
`In another aspect of the present invention, a
`preferably added to measure tl1e time elapsed during the
`encoding process against an a priori-specitied time limit.
`When the time limit expires, only the data output from those
`encoders that have completed the present encoding cycle are
`compared to determine the encoded data with the highest
`compression ratio. The time limit ensures that the real-time
`or pseudo real-time nature of the data encoding is preserved.
`In another aspect of the present invention. the results from
`each encoder are buffered to allow additional encoders to be
`sequentially applied to the output of the previous encoder.
`yielding a more optimal lossless data compression ratio.
`In another aspect of the present invention, a method for ,
`providing content independent lossless data decompression
`includes the steps of receiving as input a block of data from
`a stream of data, extracting an encoding type descriptor from
`the input data block, decoding the input data block with one
`or more of a plurality of available decoders in accordance
`with the extracted encoding type descriptor. and outputting
`the decoded data block. An input data block having a null
`descriptor type extracted therefrom is output without being
`decoded.
`
`30
`
`Advantageously, the present invention employs a plurality
`of encoders applying a plurality of compression techniques
`on an input data stream so as to achieve maximum com-
`pression in accordance with the real-time or pseudo real-
`time data rate constraint. Thus, the output hit rate is not fixed
`and the amount, if any. of permissible data quality degra-
`dation is not adaptable, but is user or data spe-cilied.
`The present invention is realized due to recent improve-
`ments in processing speed, inclusive ofdedicated analog and
`digital hardware circuits, central processing units, (and any
`hybrid combinations thereof], which, coupled with reduc-
`tions in cost. are enabling of new content independent data
`compression and decompression solutions.
`These and other aspects, features and advantages of the
`present invention will become apparent from the following
`detailed description of preferred embodiments. which is to _
`be read in connection with the accompanying drawings.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`40
`
`45
`
`FIG. 1 is a blockttlow diagram of a content dependent
`high-speed lossless data compression and decompression
`systemlmethod according to the prior art;
`FIG. 2 is a block diagram of a content independent data
`compression system according to one embodiment of the
`present invention;
`flow diagram of a data
`FIGS. 30 and 3b comprise a
`com pression method according to one aspect of the present
`invention which illustrates the operation of the data com-
`pression system of FIG. 2:
`FIG. 4 is a block diagram of a content independent data
`compression system according to another embodiment of the
`present invention having an enhanced metric for selecting an
`optimal encoding technique;
`
`S5
`
`50
`
`65
`
`FIGS. Sn and 5b comprise a flow diagram of a data
`compression method according to another aspect of the
`present invention which illustrates the operation of the data
`compression system of FIG. 4;
`FIG. 6 is a block diagram of a content independent data
`compression system according to another embodiment of the
`present
`invention having an a priori specified timer that
`provides real-time or pseudo real—time of output data;
`FIGS. 7:: and 7b comprise a flow diagram of a data
`compression method according to another aspect of the
`present invention which illustrates the operation of the data
`compression system of FIG. 6;
`FIG. 8 is a block diagram of a corttent independent data
`compression system according to another embodiment hav-
`ing an a priori specified timer that provides real—tirnc or
`pseudo real-time of output data and an enhanced metric for
`selecting an optimal encoding technique;
`FIG. 9 is a block diagram of a content independent data
`compression system according to another embodiment of the
`present invention having an encoding architecture compris-
`ing a plurality of sets of serially—cascaded encoders;
`l"I(iS. 10a and 10:‘) comprise a flow diagram of a data
`compression method according to another aspect of the
`present invention which illustrates the operation of the data
`compression system of FIG. 9;
`FIG. 11 is block diagram of a content independent data
`decompression system according to one embodiment of the
`present invention; and
`l-'IG.12is a flow diagram ofadata decompression method
`according to one aspect of the present