`(10) Patent N0.:
`(12) Unlted States Patent
`
`Fallon
`(45) Date of Patent:
`Feb. 27, 2001
`
`U5006195024B1
`
`(54) CONTENT INDEPENDENT DATA
`COMPRESSION METHOD AND SYSTEM
`
`(74) Attorney, Agent, or Firm—Frank V. DeRosa; F. Chau
`& Associates, LLP
`
`(75)
`
`Inventor:
`
`James J. Fallon, Bronxville, NY (US)
`
`(57)
`
`ABSTRACT
`
`(73) Assignee; Realtime Data, LLC, New York, NY
`(US)
`
`( * ) Notice:
`
`Subject to any diSClaimer, the term Of this
`PalenI l5 eXlended 0r adjllSled under 35
`U~S-C~ 154(b) by 0 daYS~
`
`(21) Appl. No.1 09/210,491
`(22)
`Filed:
`Dec. 11 1998
`’
`Int. Cl.7 ............................ .. H03M 7/34; H03M 7/00
`(51)
`(52) US. Cl.
`............................................... .. 341/51; 341/79
`(58) Field of Search ................................ .. 341/51, 79, 67;
`709/231, 219, 236, 250; 358/11; 712/32;
`711/208
`
`(56)
`
`References Cited
`Us. PATENT DOCUMENTS
`
`~
`
`10/1989 TSUklyama et al'
`498727009
`5/1990 0 Brlen et a1.
`.
`4,929,946
`9/1991 Mitchell et a1.
`.
`5 045 852
`.
`3/1992 Langdon, Jr. et a1.
`5:097:261
`5,175,543 * 12/1992 Lantz ................................... .. 341/51
`.
`.
`(L1st continued on next page.)
`
`Systems and methods for providing content independent
`lossless data compression and decompression. A data com-
`pression system includes a plurality of encoders that are
`configured to simultaneously or sequentially compress data
`independent of the data content. The results of the various
`encoders are compared to determine if compression is
`achieved and to determine Which encoder yields the highest
`lossless compression ratio. The encoded data With the high-
`est lossless compression ratio is then selected for subsequent
`data processing, storage, or transmittal. A compression iden-
`tification descriptor may be appended to the encoded data
`With the highest compression ratio to enable subsequent
`decompression and data interpretation. Furthermore, a timer
`may be added to measure the time elapsed during the
`encoding process against an a priori-specified time limit.
`When the time limit expires, only the data output from those
`encoders that have completed the encoding process are
`compared. The encoded data With the highest compression
`ratio is selected for data processing, storage, or transmittal.
`The imposed time limit ensures that the real-time or pseudo
`-
`.
`-
`-
`real-time nature of the data encoding is preserved. Buffering
`the output from each encoder allows additional encoders to
`be sequentially applied to the output 0fthe preViouS encoder,
`yielding a more optimal lossless data compression ratio.
`
`Primary Examiner—Patrick Wamsley
`
`34 Claims, 16 Drawing Sheets
`
`
`
`ENCODER E1
`
`
`
`
`BUFFER/
`COUNTER ‘1
`
`BUFFER]
`COUNTER 2
`
`
`ENCODER E2
`
`
`
`INPUT
`DATA
`BUFFER
`
`ENCODER E3
`
`BUFFER]
`
`COUNTER 3
`
`
`
`
`DA TA STREAM
`
`DATA
`BLOCK
`COUNTER
`
`ENCODED DA TA
`STREAM W/
`DESCRIPTOR
`
`
`
`COMPRESS'ON
`COMPRESSION
`
`
`DETERG—IFNCATION/
`TYPE
`DESCRIPTION
`COMPARISON
`
`
`
`
`
`
`
`
`ENCODER En
`
`
`
`
`30
`
`BUFFER/
`COUNTER n
`
`40
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 1
`
`NetApp; Rackspace Exhibit 1004 Page 1
`
`
`
`Us 6,195,024 B1
`
`Page 2
`
`.
`
`.
`
`.
`2/1998 Nakano et al.
`2/1998 Schwartz et al. .
`3/1998 Franaszek et al.
`5/1998 Huang et al.
`.
`6/1998 Nakazato et al.
`7/1998 Rostoker et al. .
`8/1998 Israelsen et al. .
`.
`9/1998 Kawashlma et al. .
`..
`9/1998 Yajrma.
`10/1998 Langley.
`10/1998 Canfield et al.
`10/1998 Dobson et al.
`10/1998 Canfield et al.
`12/1998 Canfield et al.
`1/1999 Ryu etal..
`6/1999 Ando .
`10/1999 Packard.
`11/1999 Fall et al. .
`1
`2/2000 Glb t
`t
`1
`er 6 ‘1‘ ‘
`
`.
`
`.
`
`.
`.
`
`U.S. PATENT DOCUMENTS
`.
`5/1993 Nomifle “ 9 ~
`3133: SDangl 6} a1~ ‘1
`/
`“011551 a a "
`9/1993 Jackson.
`.
`12/1993 Balkanskl et al. .
`1/1995 Storer.
`1/1995 Allen et al. .
`.
`.
`2/1995 Kulakowskr et al.
`5/1995 Chang et al.
`........................ .. 341/79
`.
`10/1995 Normlle et al. .
`11/1995 Chu.
`11/1995 Allen et al..
`12/1995 Campbell et al.
`.
`1/1996 Remrllard.
`2/1996 Je—Chang et al.
`7/1996 James .
`
`.
`
`.
`
`*
`
`12/1996 Allen et al..
`5/1997 Craft .
`8/1997 Clark, 11 .
`9/1997 Iler .
`
`53127742
`gjgiéagii
`3
`7
`5,243,348
`5,270,832
`5,379,036
`5,381,145
`5,394,534
`5,412,384
`5,461,679
`5,467,087
`5,471,206
`5,479,587
`5,486,826
`5,495,244
`5,533,051
`
`5,583,500
`5,627,534
`5,654,703
`5,668,737
`
`5,717,393
`5,717,394
`5,729,228
`5,748,904
`5,771,340
`5,784,572
`5,799,110
`5,805,932
`5,809,176
`5,818,368
`5 818 530
`’
`’
`5 819 215
`7
`7
`5,825,424
`5,847,762
`5,861,824
`5,917,438
`5,964,842
`5,991,515
`6031939
`7
`7
`
`* Cited by examiner
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 2
`
`NetApp; Rackspace Exhibit 1004 Page 2
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 1 0f 16
`
`US 6,195,024 B1
`
`'________.__________
`
`I I I I I
`
`r/
`
`1
`
`I
`I
`
`2
`
`3
`
`INPUT DATA STREAM
`
`IDENTIFY INPUT DATA TYPE AND
`GENERATE DATA TYPE IDENTIFICATION
`
`SIGNAL
`
`
`DATA TYPE
`ID SIGNAL
`
`COMPRESS DATA IN ACCORDANCE WITH
`IDENTIFIED DATA TYPE
`
`I | I I |
`
`I I
`
`I
`I
`
`COMPRESSED DATA STREAM
`
`RETRIEVE DATA TYPE
`
`INFORMATION OF COMPRESSED
`DATA STREAM
`
`WITH IDENTIFIED DATA TYPE
`
`DECOMPRESS DATA IN ACCORDANCE
`
`FIG. 1
`
`PRIOR ART
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 3
`
`NetApp; Rackspace Exhibit 1004 Page 3
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 2 0f 16
`
`US 6,195,024 B1
`
`«Q.«dQWQOOZM
`
`\\SEqmmhw
`
`KOKQEOWWQ
`
`ZO_mwwmn=>_OO
`
`m—n_>._.
`
`ZOrEEmeo
`\ZO_._.<Z__2mmFmQ
`
`ZOm_m_<n:>_OO
`
`ZO_wwmmn=>_OO
`
`O_._.<m
`
`Emu.qu
`
`rEMFZDOO
`
`Faun—Dm—
`
`N15.2300
`
`Emu.qu
`
`mmwFZDOO
`
`Emmi—3m
`
`:EMHZDOO
`
`ov
`
`N.OE
`
`Fmmmooozm
`
`mmmmooozm
`
`mmmwooozw
`
`cmmmooozm
`
`._.Dn_Z_
`
`<._.<D
`
`mm“.qu
`
`<._.<D
`
`XOOIE
`
`KMHZDOO
`
`S‘v‘mmkm«a.{Q
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 4
`
`
`
`NetApp; Rackspace Exhibit 1004 Page 4
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 3 0f 16
`
`US 6,195,024 B1
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`300
`
`B
`
`COUNT SIZE OF
`DATA BLOCK
`
`BUFFER DATA BLOCK
`
`COMPRESS DATA
`BLOCK WITH
`
`ENABLED ENCODERS
`
`
`
`BUFFER ENCODED
`DATA BLOCK OUTPUT
`FROM EACH
`ENCODER
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`302
`
`304
`
`303
`
`310
`
`312
`
`314
`
`COMPARE
`
`THRESHOLD LIMIT
`
`COMPRESSION
`RATIOS WITH
`
`A
`
`FIG. 3a
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 5
`
`NetApp; Rackspace Exhibit 1004 Page 5
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 4 0f 16
`
`US 6,195,024 B1
`
`316
`
`IS
`
`
`COMPRESSION
`
`RATIO OF AT LEAST ONE
`
`ENCODED DATA BLOCK
`GREATER THAN
`
`THRESHOLD?
`
`NO
`
`
`
`318
`
`320
`
`APPEND NULL
`
`
`SELECT ENCODED
`
`DESCRIPTOR TO
`DATA BLOCK WITH
`
`
`
`
`UNENCODED INPUT
`GREATEST
`
`
`DATA BLOCK
`COMPRESSION RATIO
`
`
`
`
`
`APPEND
`
`CORRESPONDING
`DESCRIPTOR
`
`
`OUTPUT UNENCODED
`
`OUTPUT ENCODED
`
`DATA BLOCK WITH
`DATA BLOCK WITH
`
`
`NULL DESCRIPTOR
`DESCRIPTOR
`
`
`
`MORE
`
`
`DATA BLOCKS IN INPUT
`
`STREAM?
`
`
`
`
`TERMINATE DATA
`
`COMPRESSION
`
`
`PROCESS
`
`
`
`330
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`
`
`
`
`FIG. 3b
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 6
`
`NetApp; Rackspace Exhibit 1004 Page 6
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 5 0f 16
`
`US 6,195,024 B1
`
`EEmmmoozm
`
`§:EEmENEqummmmooozm
`
`Emuqu
`
`_‘NEHZDOO
`
`rmmmooozm
`
`
`
`Edwmrmtm.«Q.«\Q
`
`ZO_mwm_N_n=>_OO
`
`zOm_m<n__>_oo.
`
`zo_mmmmn_s_oo5a;E<o
`
`
`zoEEommommmpzaoo«mmqumszaoo
`
`
`
`mafi\zo:Mq_F__\<¢mEmoEEqu8mmooozm<53xogm
`
`
`
`ZO_._.<Z=2N_m._.m_DmeHo/E
`
`
`
`mOmmDGEmmooozm
`
`
`
`tmwfi>t45<m5mo
`
`v.®_n_
`
`EmELDm
`
`cmmPZDOO
`
`cmmmooozm
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 7
`
`
`
`NetApp; Rackspace Exhibit 1004 Page 7
`
`
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 6 0f 16
`
`US 6,195,024 B1
`
`B
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`BUFFER DATA BLOCK
`
`COUNT SIZE OF
`DATA BLOCK
`
`500
`
`502
`
`504
`
`COMPRESS DATA
`BLOCK WITH
`
`ENABLED ENCODERS
`
`
`APPEND CORRESPONDING
`
`
`DESIRABILITY FACTORS TO
`ENCODED DATA BLOCKS
`
`
`BUFFER ENCODED DATA
`BLOCK OUTPUT
`
`FROM EACH ENCODER
`
`508
`
`510
`
`COUNT SIZE OF
`ENCODED DATA
`BLOCKS
`
`RATIOS
`
`CALCULATE
`COMPRESSION
`
`512
`
`514
`
`COMPARE COMPRESSION
`
`RATIOS WITH THRESHOLD LIMIT
`
`516
`
`FIG. 5a
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 8
`
`NetApp; Rackspace Exhibit 1004 Page 8
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 7 0f 16
`
`US 6,195,024 B1
`
`A
`
`518
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`THRESHOLD?
`
`
`
`520
`
`522
`
`
`
`
`
`
`APPEND NULL
`
`
`CALCULATE FIGURE OF
`
`DESCRIPTOR TO
`MERIT FOR EACH ENCODED
`
`
`
`UNENCODED INPUT
`DATA BLOCK WHICH EXCEED
`
`
`DATA BLOCK
`THRESHOLD
`
`
`
`
`SELECT ENCODED DATA
`
`BLOCK WITH GREATEST
`FIGURE OF MERIT
`
`APPEND
`OUTPUT UNENCODED
`
`
`DATA BLOCK WITH
`CORRESPONDING
`
`NULL DESCRIPTOR
`DESCRIPTOR
`
`
`
`
`OUTPUT ENCODED
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`PROCESS
`
`TERMINATE DATA
`COMPRESSION
`
`MORE
`
`
`DATA BLOCKS IN
`
`INPUT STREAM?
`
`YES
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`534
`
`
`
`
`B
`
`FIG. 5b
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 9
`
`NetApp; Rackspace Exhibit 1004 Page 9
`
`
`
`
`
`US. Patent
`
`b
`
`2
`
`61f08t
`
`US 6,195,024 B1
`
`2.E5emqoozm
`
`0moEEomeNmmfizaoo
`7,E:EEmEm“.qummmmooozw
`
`
`
`
`zoammmazooSag<55mm?\ZOfiMflEQMmEQEmtammm.mmooozm5.350.5mzo_mmmmas_oo
`
`
`
`
`20:25memE5300mm:qummfizaoo
`
`S20w_m<n_s_oo
`
`
`
`W..o_‘
`
`o.0_u_
`
`:EMFZDOO
`
`cEmmmnmwmmooozm
`
`om
`
`ime
`
`
`
`NEE.sznzomn‘m
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 10
`
`
`
`Em“.qu
`
`
`
`REmkzzoo:fiEmEE
`
`rmmmDOOZm
`
`
`
`NetApp; Rackspace Exhibit 1004 Page 10
`
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 9 0f 16
`
`US 6,195,024 B1
`
`700
`
`702
`
`E3
`
`704
`
`705
`
`708
`
`INPUT INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`COUNT SIZE OF
`
`DATA BLOCK
`
`71o
`
`TIME EXPIRED?
`
`
`No
`
`No
`
`
`
`
`COMPLETE?
`
`
`ENCODING
`
`
`
`
`
`BUFFER DATA BLOCK
`
`INITIALIZE TIMER
`
`STOP
`ENCODING
`PROCESS
`
`718
`BUFFER ENCODED
`
`
`BEGIN
`BUFFER
`DATA BLOCK FOR EACH
`
`
`COMPRESSING
`ENCODER THAT
`
`
`DATA BLOCK WITH
`FROM EACH
`COMPLETED ENCODING
`ENCODERS
`PROCESS
`
`
`
` ENCODER WITHIN TIME LIMIT
`
`
`
`ENCODED DATA
`BLOCKS
`
`720
`
`
`
` COUNT SIZE OF
`
` CALCULATE
`
`
`COMPRESSION
`RATIOS
`
`722
`
`
`
`COMPARE COMPRESSION
`
`
`RATIOS WITH THRESHOLD
`
`
`LIMIT
`
`
`
`724
`
`FIG. 7a
`
`NetApp;Rackspace
`
`Exhibit1004
`
`Page11
`
`NetApp; Rackspace Exhibit 1004 Page 11
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 10 0f 16
`
`US 6,195,024 B1
`
`726
`
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`ENCODED DATA BLOCK
`
`GREATER THAN
`THRESHOLD?
`
`
`NO
`
`
`
`APPEND NULL
`SELECT ENCODED
`
`
`DATA BLOCK WITH
`DESCRIPTOR TO
`
`
`UNENCODED INPUT
`GREATEST
`
`
`
`DATA BLOCK
`COMPRESSION RATIO
`
`
`
`APPEND
`
`CORRESPONDING
`DESCRIPTOR
`
`728
`
`730
`
`
`
`
`OUTPUT UNENCODED
`
`OUTPUT ENCODED
`
`
`DATA BLOCK WITH
`DATA BLOCK WITH
`
`
`NULL DESCRIPTOR
`DESCRIPTOR
`
`
`
`
`
`
`
`MORE
`
`
`DATA BLOCKS IN INPUT
`
`STREAM?
`
`TERMINATE DATA
`
`COMPRESSION
`
`
`PROCESS
`
`
`
`740
`
`RECEIVE NEXT DATA
`
`
`BLOCK FROM INPUT
`
`STREAM
`
`
`FIG. 7b
`
`NetApp;Rackspace
`
`Exhibit1004
`
`Page12
`
`NetApp; Rackspace Exhibit 1004 Page 12
`
`
`
`
`
`US. Patent
`
`61f011teehS
`
`US 6,195,024 B1
`
`w...E:EEm
`
`b.«9&3me
`
`E3mmmoozm
`
`Emu.qu
`
`_.KMHZDOO
`
`EMHEDm.
`
`N15.2300
`
`EwnEDm
`
`mKMFZDOO
`
`EmuEDm
`
`:mmHZDOO
`
`
`
`zo_mmmmn__>_ooGEEN,zo_mmmmn=>_oo1ZOmEEEOQmzoEEommo\zo:<z__2mmflo2mat
`
`
`
`to$50:
`
`5m:
`
`ZO_._.<Z__>_mm_.m_D
`
`
`
` {
`mmmmooozm
`Nmmmooozm
`cmmmooozw
`
`rm.mmooozm
`
`HDQZ_
`
`<._.<n_
`
`awn—“Em
`
`
`
`<._.<Q.512.
`
`XOOAm
`
`\mOmmwOOma
`
`ZMHZDOO
`
`EvflmfihE.«d
`
`mmooozm
`
`>H:_m<m_wm_n_
`
`meHOdE
`
`mm=>=._.
`
`om
`
`-IWMD
`
`
`
`ME:QMIGMQW
`
`w.9“.
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 13
`
`
`
`NetApp; Rackspace Exhibit 1004 Page 13
`
`
`
`
`
`
`
`.
`C
`
`FOE
`
`
`
`
`
`
`
`c._\mN._.m.
`
`firm.
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 12 0f 16
`
`US 6,195,024 B1
`
`ZO_mwm_Mn=>_OO
`
`O_._.<m
`
`ZOw_m<n__>_OO\ZO_._.<Z__>_mm_._.m_D
`
`“.0MMDOE
`
`Elm—S.
`
`ZO_.r<Z__>_mm_.rm_n_
`
`om
`
`ZO_mmmmm_>_OO
`
`m_n_>._.
`
`ZOrrnzmome
`
`«Q.«dQMQOOZM
`
`\\SV<<mwfim
`
`IOKQEOmMQ
`
`
`
`moi
`
`:.mmNdm.
`
`C
`
`NOE
`
`c.Nm—NNm
`
`_._Nm_
`
`
`
`C
`
`EOE
`
`:6:mNEm.
`
`_\_Em
`
`m.®_n_
`
`mmooozm
`
`>._._.__m<m__mm_n_
`
`mmOhO/E
`
`Oom
`
`mm=>=H
`
`om
`
`-mmmb
`
`
`
`NEE.QMEGMQM,
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 14
`
`
`
`._.3n_Z_
`
`<._.<n_
`
`<F<D
`
`XOOJm
`
`mwnEDm
`
`EMHZDOO
`
`SEEKS.«R«d
`
`NetApp; Rackspace Exhibit 1004 Page 14
`
`
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 13 0f 16
`
`US 6,195,024 B1
`
`100
`
`102
`
`104
`
`106
`
`108
`
`RECEIVE INITIAL
`DATA BLOCK FROM
`INPUT DATA STREAM
`
`COUNT SIZE OF
`
`DATA BLOCK
`ENCODER PATHS
`
`BUFFER DATA
`BLOCK
`
`INITIALIZE TIMER
`
`APPLY INPUT DATA
`BLOCK TO FIRST
`ENCODING STAGE
`IN CASCADED
`
`TIME EXPIRED?
`
`(110
`
`NO
`
`APPLY OUTPUT
`OF COMPLETED
`ENCODING
`STAGE TO NEXT
`ENCODING
`STAGE IN
`
`I 16
`
`BUFFER
`ENCODED DATA
`
`112 YES
`
`BLOCK OUTPUT
`FROM
`
`COMPLETED
`ENCODING
`STAGE
`
`ENCODING
`STAGE
`COMPLETE?
`
`CASCADE PATH
`
`
`
`SELECT BUFFERED OUTPUT OF LAST
`
`
`ENCODING STAGE IN ENCODER
`CASCADE THAT COMPLETED ENCODING
`PROCESS WITHIN TIME LIMIT
`
`
`
`COUNT SIZE OF
`
`
`122
`ENCODED DATA
`BLOCKS
`
`STOP ENCODING
`PROCESS
`
`CALCULATE
`COMPRESSION
`RATIOS
`
`124
`
`
`
`COMPARE COMPRESSION
`RATIOS WITH THRESHOLD
`LIMIT
`
`126
`
`FIG. 103
`
`A
`
`NetApp;RaCkspace
`
`Exhibit1004
`
`Page15
`
`NetApp; Rackspace Exhibit 1004 Page 15
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 14 0f 16
`
`US 6,195,024 B1
`
`128
`IS
`
`
`COMPRESSION
`
`
`RATIO OF AT LEAST ONE
`
`
`ENCODED DATA BLOCK
`
`GREATER THAN
`
`THRESHOLD?
`
`
`
`134
`
`136
`
`BLOCK WITH GREATEST
`FIGURE OF MERIT
`
`
`
`130
`
`132
`
`140
`
`DATA BLOCK WITH
`DESCRIPTOR
`
`
`
`MORE
`
`DATA BLOCKS IN
`
`INPUT STREAM?
`
`
`
`
`YES
`
`TERMINATE DATA
`COMPRESSION
`
`
`
`
`
`RECEIVE NEXT DATA
`BLOCK FROM INPUT
`STREAM
`
`144
`
`PROCESS
`
`3 FIG. 10b
`
`NetApp;Rackspace
`
`Exhibit1004
`
`Page16
`
`
`CALCULATE FIGURE OF
`APPEND NULL
`MERIT FOR EACH ENCODED
`DESCRIPTOR TO
`
`
`DATA BLOCK WHICH EXCEED
`UNENCODED INPUT
`
`THRESHOLD
`DATA BLOCK
`
` SELECT ENCODED DATA
`
`OUTPUT UNENCODED
`APPEND
`
`
`DATA BLOCK WITH
`CORRESPONDING
`138
`
`
`NULL DESCRIPTOR
`DESCRIPTOR
`
`
`
` OUTPUT ENCODED
`
`
`
`142
`
`\
`
`NetApp; Rackspace Exhibit 1004 Page 16
`
`
`
`
`
`US. Patent
`
`2b.W...
`
`2
`
`ehS
`
`51
`
`M
`
`
`
`W«RedRDQFDO
`
`79NDmmDOOwD
`
`quZ\\S«R«d
`
`IOhntmome
`
`_.DmeOOm—D
`
`a00:.
`
`mmHEDm
`
`pm:0EMDOOMD
`
`#0:
`
`
`
`
`
`Eqflmhm,<H<DFDR—.30mommoooma
`
`
`
`No:
`
`oo:
`
`KOFQEmeD
`
`2O_._.O<m._.Xm
`
`
`
`awn—“Emx00..m_
`
`
`<._.<n_._.Dn_z_
`33$me«R«d
`
`91,6SU
`
`1BM.
`
`0.s,:o:
`
`
`
`NetApp; Rackspace
`
`Exhibit 1004
`
`Page 17
`
`NetApp; Rackspace Exhibit 1004 Page 17
`
`
`
`
`
`US. Patent
`
`Feb. 27, 2001
`
`Sheet 16 0f 16
`
`US 6,195,024 B1
`
` 1200
`RECEIVE INITIAL
`
`DATA BLOCK FROM
`
`
`INPUT DATA STREAM
`BUFFER DATA BLOCK
`
`
`
` EXTRACT DATA
`
`COMPRESSION TYPE
`
`
`DESCRIPTOR
`
`
`
`
`1202
`
`1204
`
`ISDATA
`
`
`COMPRESSON
`
`
`TYPEDESCRPTO'
`
`
`NULL?
`
`
`
`DECODE DATA BLOCK USING
`
`
`SELECT DECODER(S)
`CORRESPONDING TO
`
`
`DESCRIPTOR
`1208
`
`OUTPUT
`RECEIVE NEXT
`
`
`
`UN DECODED
`DATA BLOCK IN
`
`
`INPUT STREAM
`DATA BLOCK
`
`
`
`SELECTED DECODER(S)
`
` OUTPUT DECODED
`
`
`DATA BLOCK
`
`
`
` MOREDATA
`
`BLOCKSININPUT
`
`
`STREAM?
`
`
`
`
`
`TERMINATE
`DECODING PROCESS
`
`1218
`
`FIG. 12
`
`NetApp;Rackspace
`
`Exhibit1004
`
`Page18
`
`NetApp; Rackspace Exhibit 1004 Page 18
`
`
`
`US 6,195,024 B1
`
`1
`CONTENT INDEPENDENT DATA
`COMPRESSION METHOD AND SYSTEM
`
`BACKGROUND
`
`1. Technical Field
`
`The present invention relates generally to data compres-
`sion and decompression and, more particularly, to systems
`and methods for providing content independent lossless data
`compression and decompression.
`2. Description of the Related Art
`Information may be represented in a variety of manners.
`Discrete information such as text and numbers are easily
`represented in digital data. This type of data representation
`is known as symbolic digital data. Symbolic digital data is
`thus an absolute representation of data such as a letter,
`figure, character, mark, machine code, or drawing.
`Continuous information such as speech, music, audio,
`images and video, frequently exists in the natural world as
`analog information. As is well-known to those skilled in the
`art, recent advances in very large scale integration (VLSI)
`digital computer technology have enabled both discrete and
`analog information to be represented with digital data.
`Continuous information represented as digital data is often
`referred to as diffuse data. Diffuse digital data is thus a
`representation of data that is of low information density and
`is typically not easily recognizable to humans in its native
`form.
`
`There are many advantages associated with digital data
`representation. For instance, digital data is more readily
`processed, stored, and transmitted due to its inherently high
`noise immunity. In addition, the inclusion of redundancy in
`digital data representation enables error detection and/or
`correction. Error detection and/or correction capabilities are
`dependent upon the amount and type of data redundancy,
`available error detection and correction processing, and
`extent of data corruption.
`One outcome of digital data representation is the continu-
`ing need for increased capacity in data processing, storage,
`and transmittal. This is especially true for diffuse data where
`increases in fidelity and resolution create exponentially
`greater quantities of data. Data compression is widely used
`to reduce the amount of data required to process, transmit,
`or store a given quantity of information. In general, there are
`two types of data compression techniques that may be
`utilized either separately or jointly to encode/decode data:
`lossless and lossy data compression.
`Lossy data compression techniques provide for an inexact
`representation of the original uncompressed data such that
`the decoded (or reconstructed) data differs from the original
`unencoded/uncompressed data. Lossy data compression is
`also known as irreversible or noisy compression. Entropy is
`defined as the quantity of information in a given set of data.
`Thus, one obvious advantage of lossy data compression is
`that the compression ratios can be larger than the entropy
`limit, all at the expense of information content. Many lossy
`data compression techniques seek to exploit various traits
`within the human senses to eliminate otherwise impercep-
`tible data. For example, lossy data compression of visual
`imagery might seek to delete information content in excess
`of the display resolution or contrast ratio.
`On the other hand, lossless data compression techniques
`provide an exact representation of the original uncom-
`pressed data. Simply stated, the decoded (or reconstructed)
`data is identical to the original unencoded/uncompressed
`data. Lossless data compression is also known as reversible
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`4-0
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`
`or noiseless compression. Thus, lossless data compression
`has, as its current limit, a minimum representation defined
`by the entropy of a given data set.
`There are various problems associated with the use of
`lossless compression techniques. One fundamental problem
`encountered with most lossless data compression techniques
`are their content sensitive behavior. This is often referred to
`
`as data dependency. Data dependency implies that the com-
`pression ratio achieved is highly contingent upon the content
`of the data being compressed. For example, database files
`often have large unused fields and high data redundancies,
`offering the opportunity to losslessly compress data at ratios
`of 5 to 1 or more. In contrast, concise software programs
`have little to no data redundancy and, typically, will not
`losslessly compress better than 2 to 1.
`Another problem with lossless compression is that there
`are significant variations in the compression ratio obtained
`when using a single lossless data compression technique for
`data streams having different data content and data size. This
`process is known as natural variation.
`Afurther problem is that negative compression may occur
`when certain data compression techniques act upon many
`types of highly compressed data. Highly compressed data
`appears random and many data compression techniques will
`substantially expand, not compress this type of data.
`For a given application, there are many factors which
`govern the applicability of various data compression tech-
`niques. These factors include compression ratio, encoding
`and decoding processing requirements, encoding and decod-
`ing time delays, compatibility with existing standards, and
`implementation complexity and cost, along with the adapt-
`ability and robustness to variations in input data. A direct
`relationship exists in the current art between compression
`ratio and the amount and complexity of processing required.
`One of the limiting factors in most existing prior art lossless
`data compression techniques is the rate at which the encod-
`ing and decoding processes are performed. Hardware and
`software implementation tradeoffs are often dictated by
`encoder and decoder complexity along with cost.
`Another problem associated with lossless compression
`methods is determining the optimal compression technique
`for a given set of input data and intended application. To
`combat this problem, there are many conventional content
`dependent techniques which may be utilized. For instance,
`filetype descriptors are typically appended to file names to
`describe the application programs that normally act upon the
`data contained within the file. In this manner data types, data
`structures, and formats within a given file may be ascer-
`tained. Fundamental problems with this content dependent
`technique are:
`(1) the extremely large number of application programs,
`some of which do not possess published or documented
`file formats, data structures, or data type descriptors;
`(2)
`the ability for any data compression supplier or
`consortium to acquire, store, and access the vast
`amounts of data required to identify known file descrip-
`tors and associated data types, data structures, and
`formats; and
`(3) the rate at which new application programs are devel-
`oped and the need to update file format data descrip-
`tions accordingly.
`An alternative technique that approaches the problem of
`selecting an appropriate lossless data compression technique
`is disclosed in US. Pat. No. 5,467,087 to Chu entitled “High
`Speed Lossless Data Compression System” (“Chu”). FIG. 1
`illustrates an embodiment of this data compression and
`
`NetApp;Rackspace
`
`Exhibit1004
`
`Page19
`
`NetApp; Rackspace Exhibit 1004 Page 19
`
`
`
`3
`
`4
`
`US 6,195,024 B1
`
`decompression technique. Data compression 1 comprises
`two phases, a data pre-compression phase 2 and a data
`compression phase 3. Data decompression 4 of a com-
`pressed input data stream is also comprised of two phases,
`a data type retrieval phase 5 and a data decompression phase
`6. During the data compression process 1, the data pre-
`compressor 2 accepts an uncompressed data stream, identi-
`fies the data type of the input stream, and generates a data
`type identification signal. The data compressor 3 selects a
`data compression method from a preselected set of methods
`to compress the input data stream, with the intention of
`producing the best available compression ratio for that
`particular data type.
`There are several problems associated with the Chu
`method. One such problem is the need to unambiguously
`identify various data types. While these might include such
`common data types as ASCII, binary, or unicode, there, in
`fact, exists a broad universe of data types that fall outside the
`three most common data types. Examples of these alternate
`data types include: signed and unsigned integers of various
`lengths, differing types and precision of floating point
`numbers, pointers, other forms of character text, and a
`multitude of user defined data types. Additionally, data types
`may be interspersed or partially compressed, making data
`type recognition difficult and/or impractical. Another prob-
`lem is that given a known data type, or mix of data types
`within a specific set or subset of input data,
`it may be
`difficult and/or impractical to predict which data encoding
`technique yields the highest compression ratio.
`Chu discloses an alternate embodiment wherein a data
`
`compression rate control signal is provided to adjust specific
`parameters of the selected encoding algorithm to adjust the
`compression time for compressing data. One problem with
`this technique is that the length of time to compress a given
`set of input data may be difficult or impractical to predict.
`Consequently, there is no guarantee that a given encoding
`algorithm or set of encoding algorithms will perform for all
`possible combinations of input data for a specific timing
`constraint. Another problem is that, by altering the param-
`eters of the encoding process,
`it may be difficult and/or
`impractical to predict the resultant compression ratio.
`Other conventional techniques have been implemented to
`address the aforementioned problems. For instance, US.
`Pat. No. 5,243,341 to Seroussi et al. describes a class of
`Lempel-Ziv lossless data compression algorithms that uti-
`lize a memory based dictionary of finite size to facilitate the
`compression and decompression of data. A second standby
`dictionary is included comprised of those encoded data
`entries that compress the greatest amount of input data.
`When the current dictionary fills up and is reset, the standby
`dictionary becomes the current dictionary, thereby maintain-
`ing a reasonable data compression ratio and freeing up
`memory for newly encoded data strings. Multiple dictionar-
`ies are employed within the same encoding technique to
`increase the lossless data compression ratio. This technique
`demonstrates the prior art of using multiple dictionaries
`within a single encoding process to aid in reducing the data
`dependency of a single encoding technique. One problem
`with this method is that it does not address the difficulties in
`
`dealing with a wide variety of data types.
`teaches a
`US. Pat. No. 5,717,393 to Nakano, et al.
`plurality of code tables such as a high-usage code table and
`a low-usage code table in an entropy encoding unit. A
`block-sorted last character string from a block-sorting trans-
`forming unit is the move-to-front transforming unit is trans-
`formed into a move-to-front (MTF) code string. The entropy
`encoding unit switches the code tables at a discontinuous
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`part of the MTF code string to perform entropy coding. This
`technique increases the compression rate without extending
`the block size. Nakano employs multiple code tables within
`a single entropy encoding unit to increase the lossless data
`compression ratio for a given block size, somewhat reducing
`the data dependency of the encoding algorithm. Again, the
`problem with this technique is that it does not address the
`difficulties in dealing with a wide variety of data types.
`US. Pat. No. 5,809,176 to Yajima discloses a technique of
`dividing a native or uncompressed image data into a plu-
`rality of streams for subsequent encoding by a plurality of
`identically functioning arithmetic encoders. This method
`demonstrates the technique of employing multiple encoders
`to reduce the time of encoding for a single method of
`compression.
`US. Pat. Nos. 5,583,500 and 5,471,206 to Allen, at al.
`disclose systems for parallel decompression of a data stream
`comprised of multiple code words. At least two code words
`are decoded simultaneously to enhance the decoding pro-
`cess. This technique demonstrates the prior art of utilizing
`multiple decoders to expedite the data decompression pro-
`cess.
`
`US. Pat. No. 5,627,534 to Craft teaches a two-stage
`lossless compression process. A run length precompressed
`output is post processed by a Lempel-Ziv dictionary sliding
`window dictionary encoder that outputs a succession of
`fixed length data units. This yields a relatively high-speed
`compression technique that provides a good match between
`the capabilities and idiosyncrasies of the two encoding
`techniques. This technique demonstrates the prior art of
`employing sequential lossless encoders to increase the data
`compression ratio.
`US. Pat. No. 5,799,110 to Israelsen, et al. discloses an
`adaptive threshold technique for achieving a constant bit rate
`on a hierarchical adaptive multistage vector quantization. A
`single compression technique is applied iteratively until the
`residual
`is reduced below a prespecified threshold. The
`threshold may be adapted to provide a constant bit rate
`output. If the nth stage is reached without the residual being
`less than the threshold, a smaller input vector is selected.
`US. Pat. No. 5,819,215 to Dobson, et al. teaches a method
`of applying either lossy or lossless compression to achieve
`a desired subjective level of quality to the reconstructed
`signal.
`In certain embodiments this technique utilizes a
`combination of run-length and Huffman encoding to take
`advantage of other local and global statistics. The tradeoffs
`considered in the compression process are perceptible dis-
`tortion errors versus a fixed bit rate output.
`SUMMARY OF THE INVENTION
`
`The present invention is directed to systems and methods
`for providing content independent lossless data compression
`and decompression. In one aspect of the present invention,
`a method for providing content independent lossless data
`compression comprises the steps of:
`(a) receiving as input a block of data from a stream of
`data, the data stream comprising one of at least one data
`block and a plurality of data blocks;
`(b) counting the size of the input data block;
`(c) encoding the input data block with a plurality of
`lossless encoders to provide a plurality of encoded data
`blocks;
`(d) counting the size of each of the encoded data blocks;
`(e) determining a lossless data compression ratio obtained
`for each of the encoders by taking the ratio of the size
`of the encoded data block output from the encoders to
`the size of the input data block;
`
`NetApp;Rackspace
`
`Exhibit1004
`
`Page 20
`
`NetApp; Rackspace Exhibit 1004 Page 20
`
`
`
`5
`
`6
`
`US 6,195,024 B1
`
`(f) comparing each of the determined compression ratios
`with an a priori user specified compression threshold;
`(g) selecting for output the input data block and append-
`ing a null data type compression descriptor to the input
`data block, if all of the encoder compression ratios fall
`below the a priori specified compression threshold; and
`(h) selecting for output the encoded data block having the
`highest compression ratio and appending a correspond-
`ing data type compression descriptor to the selected
`encoded data block, if at least one of the compression
`ratios exceed the a priori specified compression thresh-
`old.
`
`In another aspect of the present invention, a timer is
`preferably added to measure the time elapsed during the
`encoding process against an a priori-specified time limit.
`When the time limit expires, only the data output from those
`encoders that have completed the present encoding cycle are
`compared to determine the encoded data with the highest
`compression ratio. The time limit ensures that the real-time
`or pseudo real-time nature of the data encoding is preserved.
`In another aspect of the present invention, the results from
`each encoder are buffered to allow additional encoders to be
`
`sequentially applied to the output of the previous encoder,
`yielding a more optimal lossless data compression ratio.
`In another aspect of the present invention, a method for
`providing content independent lossless data decompression
`includes the steps of receiving as input a block of data from
`a stream of data, extracting an encoding type descriptor from
`the input data block, decoding the input data block with one
`or more of a plurality of available decoders in accordance
`with the extracted encoding type descriptor, and outputting
`the decoded data block. An input data block having a null
`descriptor type extracted therefrom is output without being
`decoded.
`
`Advantageously, the present invention employs a plurality
`of encoders applying a plurality of compression techniques
`on an input data stream so as to achieve maximum com-
`pression in accordance with the real-time or pseudo real-
`time data rate constraint. Thus, the output bit rate is not fixed
`and the amount, if any, of permissible data quality degra-
`dation is not adaptable, but is user or data specified.
`The present invention is realized due to recent improve-
`ments in processing speed, inclusive of dedicated analog and
`digital hardware circuits, central processing units, (and any
`hybrid combinations thereof), which, coupled with reduc-
`tions in cost, are enabling of new content independent data
`compression and decompression solutions.
`These and other aspects, features and advantages of the
`present invention will become apparent from the following
`detailed description of preferred embodiments, which is to
`be read in connection with the accompanying drawings.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a block/flow diagram of a content dependent
`high-speed lossless data compression and decompression
`system/method according to the prior art;
`FIG. 2 is a block diagram of a content independent data
`compression system according to one embodiment of the
`present invention;
`FIGS. 3a and 3b comprise a flow diagram of a data
`compression method according to one aspect of the present
`invention which illustrates the operation of the data com-
`pression system of FIG. 2;
`FIG. 4 is a block diagram of a content independent data
`compression system according to another embodiment of the
`present invention