throbber
WWMWMWHMH IWWMWHMMWM
`
`USU(}6253264Bl
`
`(12; United States Patent
`Sebastian
`
`(10) Patent No.:
`:45) Date nf Patent:
`
`US 6,253,264 B1
`*Jun. 26, 2001
`
`3-11151
`3115035
`
`341 -‘In’?
`.’
`341.-107
`'F'U"?1'll'12
`H17-"ll'H
`395.’! 14
`
`1 1111997 Panauussis
`5.1111-11.403
`5.110.823 1
`1_1wt.1.=1 Coleman
`
`tjvrmish et «'11-
`1rI9‘_J8
`§.-?l<15='»2 *
`: 1}-figgg Eulhwr --------
`Dmes
`3,
`_.
`3
`5.367.114 1‘
`2.11000 Barhir
`.............
`
`
`
`5,001,515
`
`11110021 Fall-slat.
`
`,
`,
`,
`,,
`_
`,
`[DRl;lGN FAILNI DUC UMEN F5
`11 1129 117 A2
`1111990
`(1211)
`.
`
`OTHER PUB! ICA-I-IONS
`'
`R1'1:::,R,lT,t.-.1a1.,"VLSl uuiw.-rsa1Noisc1..-asCoc1cr,"1\rr15'
`Tech Notes 3301. p. 334 (Mar. 1.1990}.
`
`(54) CODING NE'[‘WORKGR()UPING DATA or
`SAME um); Type mu) 1;: pcxs usmc
`
`1-‘LLI-L 1')/\‘[‘A STRlJC'l‘URE. AND SELI-JCTING
`COMPRESSION FOR INDIVIDUAL BLOCK
`;
`_
`rm
`-
`“ASL 0” NDCK DATA IYPF‘
`lnvenlor: William Seb1’|slifln.I7almnulh. MA
`(US)
`
`(75)
`
`(73)
`
`A.-ssignt-11::
`
`_
`{ ‘ J Nntlr.-1::
`
`Intelligent Compression ’l‘eclIn0l0gles.
`Hlmomh‘ MA (US)
`I
`I
`‘
`_
`_
`Hus patent Lssucd on :1 conlmucd prim-
`cculion upplicaliun filed under 37 CTR
`l.53(d), and is subj::cI lo the lw-:1'11_v year
`""“'i5k'”5 “F 35 U'S‘c'
`
`Subject“*"3"Ydi$l“i“1°r'lh“'°"T'°“hi"3
`patent is mended or adjusted under 35
`U.S.C.
`l54(h)l1y U d.'t_\«'5-;.
`
`"Competitive Parallel Prtsccssing for
`cl al.
`Diner DB.
`C'nmprt:5.5iL'11J of 131111," .N1".'S T91:1‘1N0fe.5' 2301. p. 379111/111;»
`1’ _199m_
`
`(211 App]. No.: 09103154152
`(22)
`I-'ilc.-Ll:
`Mar. 6, 1998
`
`. med by flxamim
`
`'
`
`'1’
`
`_
`
`(60)
`
`_
`(51,)
`
`‘
`Int. (,1.
`
`7
`
`(52)
`
`11.5. CI.
`
`Related U_S_ Applicafion Dam
`Provisional application No t1l'J;'fl3E1 ‘I48
`filed on Mar
`191;-;_‘
`I
`'"
`I
`I
`‘
`............................ ,. GOISP 17130: G06? 31110:
`G06]: SE00; I-103M "NOD; HUJM 7130
`............................... .. 710163; 710.105; 341151;
`141,10?
`‘
`'
`3-—1l1'5l. E15. IU7;
`{5H) Field of Search
`7r_m'102. 503, 504. 505. 5012; 3701474;
`710%‘
`()8
`"
`
`(S6)
`‘
`
`Refumnces Cited
`US. PATENT DOCTIMENTS
`‘
`311904 Carr
`.................................... 3'1"[I1"J4.l
`3-11151
`1111095 Chu
`J‘J51’I£lI0.l3
`4-,"l'E4FJ'? Alhancsc er
`305177!)
`5!l‘}0? R110 el :11.
`......
`631907 'l')'lcr cl :11.
`.
`.. 3953117
`
`5_2‘J3,3".-"9
`5,«1m_.r11'1'!
`*
`5.61 1.54]
`*
`5,632,lI1‘J
`5,638,408 '
`
`Pri1111?ry L'.\':I1111T.r_11:1‘—Tl_10mas Lee
`AS'“'5'mm Exmmner—1aDh Nguyen
`(74) .4Imr1Ir.’_\j. .4g{.’fl£_. or F1'r111—IIa1‘11ilto1:1, Brook, Smilh &
`Reynolds. PC.
`
`{S7}
`
`d\BSTRAC'l‘
`
`A prcfcned coding, nclwork uses an architcclure called 3
`Baa"-0—I*'ilIcr—Rcs11urcc (BI.-R) sys1cn1.
`'l“his. approach inte-

`1 .
`,
`‘
`_
`_.
`1
`.
`.
`’?"""‘?5_ 1"“ "d‘.".‘”“fg"S °f .f°7""“' S',’°‘_’fi°- ‘°'""'°-§5'"" T” 3
`gcnc.1'._1 -purpose <.omprc:~s1t_J:1 loo s1..1'v1ng, 3 W1 1: 1:ar1_1,c_ 01
`dala 1L‘|l'lTlflL‘5. Sourci: (1313 19 parsed 1111:: blocks 01 s1m1[z1r
`data and each parsed blacks are compressed using a respec»
`lively selected cnmprcssinn algorithm. Thu: algnrilhrn can be
`chmacn from 11 5111110 model 01‘I|'1v.-: data 01' can be adaplivc 10
`[he dam In H]: parsed block‘
`[be parsed blocks are men
`combined inlu an uncudcd data file.
`l"or d1:1:0ding, Ihc
`‘
`.
`_
`pmccsa. 1s rcvcrscd.
`
`
`
`29 Claims. 8 Drawing Sheets
`
`—_, ..—.»i~. 1
`-.
`*-
`.\-orm/.;’
`.
`_.?-.
`;m.u"- 1
`'
`my —
`
`
`‘Ca
`-2.0:’:
`r.)-1
`..a:n.e..
`
`an
`
`Oracle 1004
`Oracle 1004
`
`

`
`U.S. Patent
`
`Jun. 20', 2001
`
`ghee: 1 of8
`
`US 6,253,264 B1
`
`SENDING APPLICATION
`
`SOURCE DATA 2
`
`ENCODER
`
`ENCODED DATA 4
`
`DECODER
`

`

`
`DECODED DATA 6
`
`Z
`
`RECEIVING APPLICATION
`
`FIG. 1
`
`PRIOR ART
`
`

`
`
`
`zofiomzmm._
`
`s_mB>mzofiumdm..
`
`
`
`
`N<._.¢Dmomnow
`
`NN.xN_m_._.:__n_
`
`U.S.Patent
`
`<_mmEmo
`
`~§_.._.§.....mm:
`
`m><mw_<
`
`ommm<m.er
`
`vmxwz_mmqn_
`
`s_mB>m
`
`Jun.2(._.2001
`
`mzo_5:w:mz_
`
`oz_wm<n_
`
`
`
`onn_mN__2o_.w:u
`
`><%.._<
`
`Sheet2of3
`
`
`
`m2w_o“_mz<E_20._.m3o
`
`mm
`
`._.zmzOn_S_OU
`
`mm::oo2
`
`mi
`
`mm
`
`omooozmmmm:m_2<m<n_
`
`
`<559N0_u_
`mm.
`
`ozfioozm/
`
`Us6,253,264B1
`
`
`
`
`

`
`U.S. Patent
`
`Jun. 26, 2001
`
`Sheet 3 of 3
`
`US 6,253,264 B1
`
`FIG. 3
`
`

`
`P5U
`
`.m
`
`%
`
`N.4
`
`2,33.6S
`
`m
`
`
`
`
`
`PMmmooo>¢.mm<20¢”.:m_oo_2oEEm.
`
`
`
`
`
`MV.O_.u_
`
`
`
`
`
`ummaoo><w_m<Emmooo.sm»
`
`
`
` .3.+mmaooI022moz_m:m320222:
`
`
`
`
`
`S.K.
`
`
`
`
`
`.w5.5o2_Emom_5_am_mn_n_OE:
`
`n.moz_Em
`
`wmm_>__E
`
`
`
`mm__._o»<s_oz_Ew
`
`
`

`
`U.S. Patent
`
`Jun. 2(._. 2001
`
`Sheet 5 of 8
`
`I.B462.352.,6SU
`
`cmX%
`
`<29Sod
`
`5
`
`EMDOONu.
`
`
`
` mm_Em>zooQ_.wm¢m#mm3<>n_m:o_.<_2z:
`
`
`
`
`
`NEDOO><mm<.201“.
`
`
`
`m;¢mmF_._wmmmzo_wmm_>2oowhzmzoaxmozw<wm:z§_Sn
`
`
`
`
`
`m.O_n_
`
`8mmmmmm
`
`
`
`
`
`
`
`MMDOOozztwO._.mmooo><«_m<O...mmooo><mm_¢O._.mmooo><w_m<O._.
`
`
`
`
`
`
`
`
`
`
`
`

`
`U.S. Patent
`
`Jun. 2(._. 2001
`
`Sheet 6 of 3
`
`US 6,253,264 B1
`
`SAMPLE DATA BASES
`
`1
`
`GENERATE PARSED FILES
`
`PARSED FILES
`
`l
`
`ANALYSIS RESULT FILES
`
`ANALYZE PARSED FILES
`GENERATE PREDICTOR DATA
`
`PDAT FILE
`
`1
`
`FIG. 6
`
`

`
`U.S. Patent
`
`Ju11.26_.2(llll
`
`Sheet 7 or 8
`
`Us 6,253,264 B1
`
`25
`
`ARRAY TO BE CODED
`ARRAY DATA
`‘V
`BUILD MODEL
`
`T1
`
`MODEL
`
`ARRAY DATA
`
`70
`/
`
`73
`
`ENCODED MODEL INFO
`
`75
`
`was
`
`SPLITFER
`
`-77
`
`TWO OR MORE ARRAYS
`
`PROCESS SEPARATELY
`
`YES
`
`MATCHING
`ALGORITHMS
`
`81
`
`
`
` USE
`INITIAL
`SPLITTER
`
`
`
`
`No
`
`79
`
`
`
`MATCHING
`* LGORITHM
`
`85 ~.
`
`NO
`
`UNMATCH ED
`VALUES
`
`OVE§§:qZ§Lg:RAY
`
`ARRAYS CONTAINING
`
`MATCH DATA
`
`83
`
`CONVERT TO |NT32
`
`
`
`
`USE
`NUMERIC
`TRANSFORM
`
`’39
`
`NUMERIC
`ALGORITHMS
`
`91
`
`OFFSETS FROM
`MODEL
`
`95 SPLI'I'I'ER
`
`TWO OR MORE ARRAYS
`
`PROCESS SEPARATELY
`
`
`
`YES
`
`REPEAT
`TRANSFORM
`
`99
`
`100
`
`FIG 7
`
`

`
`U.S. Patent
`
`Jun. 26, 2001
`
`Sheet 3 of 3
`
`US 6,253,264 B1
`
`101
`
`ARRAY DATA IN
`
`103
`
`RANGE
`
`REMAPPER
`
`
`
`
`
`
`1 O5
`
`EXTRA BITS
`
`TO PACK
`
`MTF CODER
`
`109
`
`
`
`RANGE CODES
`
`NO
`
`
`
`RANGE CODES TO
`
`LOW LEVEL CODER
`
`FIG. 8
`
`

`
`US 6,253,264 B1
`
`1
`CODING NETWORK GROUPING l)A'I'A OF
`SAME DATA TYPE INTO BLOCKS USING
`FILE DATA STRUCTIJ RE AND SEI.F.C'I'I NG
`COMPRESSION FOR INDIVIDUAL BUOCK
`BASE ON BLOCK DATA TYPE
`
`RELATED APPLICATIONS
`
`This application claims the benefit of US. Provisional
`Serial No. (tU,=’0.'l6,S43 filed Mar. 7. 1997. the teachings of
`which are incorporated herein by reference in their entirety.
`BA(TI\"(}R()UNl) ()I- THE l'NVl£N’l'I()N
`
`ID
`
`High performance data compression] systems use models
`of the data to increase their ability to predict values. which
`in turrt leads to greater compression. The best models can he
`achieved by puilding a compression system to support a
`specific data format. Instead of trying to deduce a crude
`model from the data within it specific lile, rt form at-spec'il'.lc
`compression system can provide it precise prc-determined
`model. The model can take advantage not just of the file
`formal structure, but also of statistical data gathered from
`sarnplc databases.
`Previous efforts at form at—spe<:ific compression have been
`focused on solutions to a few individual formats rather than
`on the development of at generalized method that could be
`adapted to many formats. The models which have been
`created typically involve a small number of components.
`This works adequately when most of the data is included in
`a few components, such as an image file having mostly red,
`blue. and green pixel values. Hut mny formats are best
`modeled using a large number of components, and the
`previous systems are not designed to build or encode such
`models.
`
`SUMMARY U1" Tl-IE INVIENTIIJN
`
`A preferred coding system in accordance with the inven-
`tion solves both of these problems in the prior art: it provides
`a generalized solution which can he adapted to handle a wide
`range of formats. and it effectively handles models that use
`large numbers of components. The system involves new
`approaches at many levels: front the highest (interface) level
`to the lowest (core encoding algorithms) level. The coding
`network includes a compression network for encoding data
`and a decompression network for decoding data.
`At the highest level, a pr-:.l'e.rrcd compression system in
`accordance with the invention uses an architecture called a
`Base-Filter-Resource (BFR) system. This approach inte-
`grates the advantages of format-specific com prcssion into a
`gcrIeral—purposc compression too] serving a wide range of
`data formats. The system includes filters which each support
`a specific data format, such as for Excel XLS worksheets or
`Word DOC files. The base includes the system control
`modules and a library used by all the filters. The resources
`include routines which are used by more than one liltcr, but
`which are not part of the base. lfa filter is installed which
`matches the format ofthe data to he encoded, the advantages
`of forntnt-specific compression can be realized for that data.
`Otherwise. at "generic" filter is used which achieves perfor-
`mance similar to other non—speciIic data compression sys-
`tems {such as Pl(Zip, Stacker. etc).
`At the next level, the system preferably includes il method
`for parsing source data into individual components. The
`basic approach, called "structure flipping," provides a key to
`converLi rig format
`information into compression models.
`SLructurc flipping rcorganizcs the inform:tLion in a file so that
`similar cornpone-nts that are normally separated are grouped
`together.
`
`IL}
`
`3!]
`
`3-5
`
`4!)
`
`St)
`
`Ln no
`
`oil
`
`($5
`
`2
`Upon this foundation are a numbers of tools. such as:
`a language to simplify the creation of parsing routines;
`tools to parse the source data using this method into
`separate components; and
`tools to generate models for individual components by
`automated analysis of sample data bases.
`These tools can be applied to the filters for a wide range of
`fllc and data types.
`The compression system preferably includes tools called
`customized array transforms for specific liltr.-rs and for
`certain types of filters. These tecltniqucs handle a specific
`file type or certain data constructions used by several file
`types, such as encoding two dimensional arrays of data as
`found in many database formats.
`At the low-level of the system are preferably a number of
`mechanisms for encoding data urrnys, including:
`new low-level encoding algorithms;
`melltods for integrating a large number of transforms and
`encoding algorithms:
`methods for eliminating overhead so that small data
`blocks can he ellicicnfly coded; and
`a new method for
`integrating both static models
`(determined from statistical analysis of sample
`databases) and dynamic models (adapted to the data
`vvilliin a panicula: array) into the encoding of each
`component.
`A preferred method of encoding source data: comprising
`parsing the source data into at plurality of blocks. The parsed
`blocks typically have it rlifferent format than the source data
`format. In particular, similar data from the source data are
`collected and grouped into respective blocks.
`For each block, a compression algorithm is selected from
`a plurality of candidate com prcssion algorithms and applied
`to the hlock. The compression algorithms can be determined
`based on the amount of data in the respective block.
`Iittrthermore, the compression algorithm can be adapted to
`the respective block,
`including the use of a customized
`transfonn. The selection of an algorithm can also be based
`on a compression model, which is derived from the format
`of the source data. The compressed data from the plurality
`of blocks are then combined into encoded data.
`
`The coding network can also include a decompression
`network to convert the encoded data back into the source
`
`data. First the data is decoded and then the parsing is
`reversed. In a lussless system, the resulting data is identical
`to the source data.
`Embodiments of the invention preferably take the form of
`machine executable instructions embedded in a machine-
`rcatlahte format on a Cl)—R()M. floppy disk or hard disk, or
`another machine-readable distribution medium. These
`instructions are executed by one or more processing units to
`implement the compression and decompression networks.
`BRIEF [)l_'"..‘~3r’..'.llll3"l'l(3l'Vl 0|’ 'l'l{E DRAWINGS
`
`The foregoing and other objects. features and advantages
`of the invention will be apparent from the following more
`particular description of preferred embodiments of the
`invention. as illustrated in the accornpanying drawings in
`which like reference characters refer to the same parts
`throughout the different views. The rlrawirtgs are not nec-
`essarily to scale, emphasis instead being placed upon illus-
`trating the principles ofthe invention.
`FIG. I is a high-level
`lrloclr diagram of a typical data
`compression system.
`l.’l{j. 2 is a block diagram ofa preferrccl encoder using the
`BFR system in accordance with the invention.
`
`
`
`

`
`US 6,253,264 B1
`
`3
`FIG. 3 is it How chart of an array coder.
`FIG. -'I- is it block diagram of the string coder 40 of FIG.
`
`3
`
`FIG. 5 is El block diagram of the Host coder 50 of FIG. 3.
`FIG. 6 is a flowchart of a preferred autnmztterl analysis
`system.
`FIG. 7 is at llowchttrt ovcrviewing the high-level of the
`integer coder.
`FIG. 8 is a flowchart ovcrviewing the mid-level of the
`integer coder.
`
`DETAILED l)I3SC'RlP'I'IONS OF PREI-‘ERRED
`EMBODIMENTS OF THE INVENTION
`
`5
`
`ll}
`
`ELI
`
`3!]
`
`4
`invention. The encoder 3’ is based on the use of :1 plurality
`of filters ltln. .
`.
`. , l'I.lx,
`.
`.
`. , II}: which serve specific tile
`formats. For example, one filter 10:1 might support several
`versions ofthe DBI’ database forrrtat, ‘While another lilter 10:
`might support seve rat] versions of the DOC‘ format used by
`the Microsoft Word software program. The individual filters
`provide respective selection criteria 12 to at filter selection
`system 22.
`The filter selection system 22 receives the source data 2
`and checks the selection criteria l2.n, .
`.
`.
`, 11:‘, .
`.
`.
`, 12; of
`all filters I0rt,. .
`. , 10x, .
`.
`. , I0: installed in the system to
`see it’ any of them support the source data's format. It" not.
`a "generic" litter is used which provides compression per-
`formance similar to other generic compression systems,
`such as Lcmpcl-Ziv (L2) engines. In it panieular preferred
`errthodimeut of the invention. the generic compression sys-
`tem employs an S211’ engine as described by Mr. Schindler
`in US. application Ser. No. t]8.’970,32D fded Nov.
`I-1, 1997.
`the teachings of which are incorporated herein by reference
`in their entirety. The descriptions of the network will pri-
`marily cover the situations in which a filter to support the
`date format is successfully found.
`Briefly. it parsing system 24 parses the source data 2 into
`:1 large number of parsed Arrays 25, which each include all
`the instances of it particular component in the source. tile.
`The term “Arraty", described in further detail below,
`is
`capitalized to indicate that this refers to a type of structure
`specific to the network, as opposed to the usual use of the
`term "array“. The algorithm used to parse the source data 2
`is called "Structure Flipping“, an important ttspect of the
`network, described in further detail below.
`The BFR system uses an automated analysis system (also
`described in detail below) to build ti. model for each Array
`25, which will he used when the Arrays 25 are encoded in
`an array coder 28. In some cases, better compression can be
`achieved if customized array translorrns 26 are first used.
`While the models generated by the automated analysis
`systc m process each component Array 25 independently. the
`customized routines can take advantage of inter-component
`relationships.
`The array coder 28 encodes the Arrays 25 using :1 count-
`depcndent modeling system. which includes El mixture of
`static models {constant
`for a ll databases) and dynamic
`adjustment to each particular Array 25.
`The tiller lllx has components to support each ofthe four
`sections of the encoder 3’. The selection criteria llx, deter-
`mined by the file format specification, includes inforrtiation
`needed to recognize files served by the filter 10.x‘ such as byte
`values at the beginning of the tile or tile title sutfices. Parsing
`instructions 14, also determined by the file format
`specification,
`include the information needed to parse the
`source tile into parsed Arrays 25 having all the instances of
`a particular com poncnt
`in the source tile. The custom
`in 1:
`_ component models 16 are constructed by referencing both
`the file formal specification and sample databases. They can
`take advantage of inter-component relationships to trans-
`form some of the component Arrays 25 to make them more
`compressible. Encoding parameters 18 are generated by an
`autornuted analysis system which examines component
`Arrays 25 parsed from :1 large number of sample databases.
`The encoding parameters 18 provide the data necessary to
`support the count-dependent modeling system in the array
`coder 28.
`l~‘ll.'1‘LiR Sl:1l.E.(.'l1DN SYS'l‘T_~‘.M
`A preferred compression system uses an cxptintlablc
`architecture that allows users to support new or updated
`
`FIG. 1 is :1 high—levcl block diagram of :1 typical data
`compression system. The primary components ofthe system
`are a sending application 1, an encoder 3, a clccodcr 5. and
`a receiving application 7.
`The sending application 1 provides source data 2 to be
`encoded hy the encoder 3. The sending application can be a
`file archivcr, a communication system, or any other appli-
`cation with a need to represent data using fewer bytes than
`in the native format.
`The encoder 3 receives the source data 2 and uses data
`compression algorithms to represent the data using, fewer
`bytes. The typical implementation ol‘ the encoder 3 is as
`software running on a general purpose computer, although
`the algnritlims to be described may also be implemented on
`specialized hardware.
`The output 01' the encoder is encoded data 4, which can be
`stored in memory (in which case the encoding allows more
`source data to be stored within the same Itardwarc), or
`transmitted to the decoder5 (in which case the encoding the
`source data to be transmitted faster when the transmission _
`channel bandwidth is limited).
`The decoder 5 retrieves or receives the encoded data 4 and
`reverses the algorithms used to encode the data at its to
`reconstruct the source data as decoded Llaltt 6. In a lossless
`system, the decoded data 6 is identical to the source data.
`Depending on the application, hewt:-ver, some data loss may
`be acceptable. As with the encoder 5,
`the decoder 5 is
`typically embodied as software running on a general purpose
`corn pttter, but can be implemented in specialized hardware.
`The receiving application 7 receives the decoded data 6
`for processing.
`A preferred compression engine includes liltcr-based
`encoders and decoders which can be utilized by a large
`number of st: riding ztpptieations l and receiving applications
`7. While the applications [,7 that storetretrievc or transmit.-'
`receive the encoded data are not part of the compression
`system, they may adjust encoding and decoding opera! ions
`according to system conditions such as available transmis-
`sion channel bandwidth.
`
`4!)
`
`St)
`
`The descriptions to follow will mostly cover the encoder.
`The dccodcrjust reverses. the ::ncoder‘s actions,so rt descrip-
`tion of the encoder is su Ificicnt
`to describe the overall
`eneodingjdccocling system. The term “File" will be used to
`describe at block of source data, which matches the conven-
`tional usngc, but which should also he understood to be
`applicable to many other types ofdatta blocks such as a data
`exchange in it Client-Server application.
`ENCODER
`
`FIG. 2 is a block diagram of it preferred encoder using :1
`Base-Filter-Resource (BI 11) network in accordance with the
`
`
`
`till
`
`65
`
`

`
`US 6,253,264 B1
`
`5
`formats by adding or replacing filters it]. As noted above, a
`litter 10 can support more than one format. For example. :1
`filter called “l.D'l'US123v-‘l-" might support all tile formats
`associated with the Lotus [23 program, such as the current
`WK-at format and the earlier WK3 and EMS formats. If :1
`
`Filter was later developed for a new Lotus WK5 formal, a
`user replaces the old filter with one supporting the new
`format and which would also he hackwardly compatible
`with the earlier filter with respect to the older formats.
`In the Microsoft Windows environment, each litter is :1
`separate dynamically linked lihrary (t)t.l.) tile. In addition
`to data tables. a filter Includes executable code.
`
`It]
`
`If a filter is found for :1 file, then it is dynamically linked
`and then called to encode the file. The encoded data stream
`includes an identification (ID) code indicating which filter
`was used to encode the data, and the decoder has to have a
`corresponding decoding filter.
`To implement
`the filter selection system 22, the hase
`module maintains a registry of the currently installed files,
`which includes the data need to identify source liter. of that
`type as well as the path to the DLL. This registry is updated
`each tirne at filter is added or replaced. The identification data
`can include both a list of tile title. sufficcs and a method to
`identify files via byte values near the beginning of the Ede.
`The identification procedure uses a search tree method to
`reduce the time needctl to check the list when many filters
`are installed. From this registry, the appropriate filter It) is
`invoked for the file type.
`To handle compound file formats such as Microsoft's
`01.132, one filter can invoke another filter to encode a
`sub-file or stream within the compound tile. For cxa rnplc, :1
`Winword Filter might call an Excel Filter to encode some
`Excel data which was embedded within a compound docu-
`ment tile.
`PARSING SYSTEM
`
`large
`As noted above, the parsing system 24 creates ti
`number of pa rscd Arrays 25. Each parsed Array 25 includes
`all instances of a particular cornponcnl in 11 File. A separate
`component
`is created for each member of each dc-lined
`structure type.
`
`EXAMPLE 1
`
`For example, consider the following, description of a record
`dclined in a file format description.
`(Tell Value:
`
`The ti:-rmat uses a special method of coding numbers in an
`internal numher type. called an Rh’. number, to save
`memory and disk space.
`Record Data:
`
`Olisct
`
`Name
`
`Size
`
`Contents
`
`4
`0
`8
`
`IEJ
`
`we
`col
`frnl
`
`nun-r
`
`3‘.
`3‘.
`1
`
`4
`
`Row numhrrr
`ffolurrtrt nurrtlaer
`Index to the record that includr.-5
`the cell format
`the coded number
`
`The parsing system 24 creates parsed Arrays 25 for each
`of the four components of this record. Each time an R.l-3'.
`record is found, an entry is added to each array. The parsed
`Arrays 25 are much more compressible than the raw source
`data, as shown by data from five records:
`
`35
`
`4U
`
`5U
`
`an an
`
`t'.iU
`
`65
`
`Rec
`
`Row
`
`1
`3
`3
`4
`5
`
`L'rxtZIl)t3l
`Uxlltlul
`L'IxE|t}lJl
`tixtZ|tJI't1
`EJ.\:t]rJCl]
`
`Col
`
`Uxtlml‘
`llxtltltlj
`nxtJrJn-1
`tIt)clJ'DlJu
`lJ)ctJ(Jt)‘t
`
`lndett
`
`Oxf)DS5
`0x005 f
`0x005}
`rlxousf
`riicr:-Usf
`
`Nulrt Vol
`
`Ux3tT1e 100
`f}|x3tI1a3tJt)
`nx3tf1duuo
`ox}-t1'1.-ant:
`ax3.a'1.-Jana
`
`The source data would appear almost random to a byte-
`matching algorithm (shown in hex):
`0.| UOD2D0550ftfl0elli3fDl{l0D30D3fDDD[la3fl
`3l'0l [tt]U|4[Jt]S3DUCl0r_l0li 3f
`01 {)0 06 00 Sf [10 D0 «:4 cl fl 3l‘U1{lfJ{t7 Of} 51' 00 (ID :13
`I] fit"
`But when handled as four separate Arrays 215 (Row, Col,
`Index, and Nam) using algorithms. optimized for each
`component,
`the data can compress significantly. This
`approach is referred to herein as "structure flipping,” because
`of tilt‘: way it rearranges the data.
`To encode together the instances of the same component
`i.n a database is a common compression technique, and the
`file format is often used to identify such components. But in
`the prior art. this approach has heen limited to isolating a few
`major components of a specific file formal. and the file
`format description used only as a technique of finding, these
`major components. In accordance. with preferred embodi-
`ment.-=. of the invention, a file format is actually converted
`into a compression model, which is then interfaced into an
`automated system for analyzing the data and optimizing the
`coding performance of each coniponent. While this com-
`prcssion model is usually entranced by the customized array
`transforms 26. it is still an aspect of the overall system.
`The extent
`to which preferred approaeltes ditfer from
`prior art systems is re flecled in the new tools constructed in
`order to implement it. This approach creates a large number
`of!\rrays 25. Eacli Array 25 requires a dilfe rent compression
`model, and the number of entries in each can change
`significantly from tile to tile. In order to work elI'ect_ively in
`such a case, prefe-rred embodiments of the invention employ
`new approaches in how the low-level coding is handled such
`as the elimination of overhead, count-dependent processing,
`and the use ofa processing network instead of fixed coding
`algorithms. Tht:s.c tech fliques, described above, are pro fcrrcd
`in order to handle the output of the parsing system 24.
`All of these implementation features are integrated at a
`high level. so that high-lcvel calls can handle all of the
`operations needed to initialize the system. parse the data,
`and encode the Arrays.
`In accordance with it particular preferred embodiment of
`the invention,
`the lite parsing system 24 uses three sult-
`systems:
`
`FLL Li_fl L'l"Fi:'. R
`A R R AYE
`t'vs't'ttt.'c'T'
`
`[npLtI‘.O'LttpttI InIL-t'1't|ce
`To store the data for It single component
`To paw: the input into Arrays or to write the decndecl
`An‘u_\':i into the Output stream
`
`A plurality of FILE_BUF'f'ER routines provide a llexible
`inputtoutput interface using l*‘lI.E _BUFFl3.R structures. so
`the same set of parsing routines can he used regardless of the
`inputtoutput formats. This allows the compression system to
`be independent of the lit] format. The file parsing system
`preferably supports two file formats: [305 and DLI-L2, but
`this can expanded to other file formats as well as to inter-
`
`
`
`

`
`US 6,253,264 B1
`
`7
`faces to communication ports and other U0 options. 1"or
`DOS files, the logical stream positions are usually the same
`as the raw file positions. The t'tLt:'Z format requires an
`extensive layer of code to follow an OIEZ stream within the
`container file.
`A plurality of ARRAY routines provide at system for
`managing the process of initializing, adding data to.
`encoding, and freeing arrays of data. Each ARRAY structure
`includes one array of data and the information needed to
`manage these processes. such as the number of entries in the
`Array 25, the size of the bulTer currently allocated for the
`array. and informatiort which will be useful in encoding the
`array.
`In particular, ARRAY_XX structure stores all the data
`from a component, and includes mechanisms to sttpport fast
`sequential
`rcadfwrite access to the data, an expandable
`hul’l'er'for the data. and storage ofrnoclel data. This structure
`also allows a simple set of tooLs to handle all allocation,
`parsing, and coding functions.
`the
`XX nomenclature
`indicates that slightly different ARRAY structures are
`dclirted for different data types, such as dilierent integer
`word sizes, strings, and flouting point types, although many
`of the processing routines work on all of the variations.
`A plurality of [’VS'l‘RU(.‘l' routines make up an auto—
`mated system for parsing the file data. During encoding.
`they read data from the FTLE_ BUFFER structure and write
`data into Arrays 25. During decoding, they take data from
`the Arrays 25 and write it into the l~TI.E _BLIt~‘t-"ER struc-
`tures.
`routines can take a high-level description of a
`file format and handle all the tasks associated with managing
`the parsing as well as the encoding ofthe parsed data. Many
`of the [Ville-rs
`II} will also include custramiwcd parsing
`routines which call the FILE BUF'l'~’El1 and ARRAY rou-
`tines rlirectly instead of using the PVSTRUCT routines. But
`an aspect of the parsing system 24 is the PVSTRUCT
`routines, and this discussion will be concerned primarily
`with their operation.
`A PVSTRIJCT structure includes all information needed
`to parse and code a data structure defined in a file format.
`This usually involves multiple components. so it provides
`and handles individual Arrays 25 for each component. The
`term "pvstruct"' comes from Predictor Variable Structure,
`where Predictor refers to the usc of statistical data gained
`from analyzing a large. number ofsamplc files to assist in the
`encoding of the data, and Variable indicates that it supports
`structures where the component counts can vary at run time
`for each instance (called "dynamic structures” in some
`textbooks).
`A parsing language is a defined instruction set which
`provides a concise way to specify the parsing of a
`PVS']‘RU("l‘ structure. The instruction set includes support
`for a wide range of dynamic variation, including where the
`counts of components are dependent on lile counts of other
`components as well as on external factors. It incorporates
`other lilc forirtat data, such as defined values or range
`restrictions. It also provides a means of exchanging data
`between the parser and the filter.
`The compression system is based on parsing a file so as
`to place items together that are similar. In the following
`description ofthe compression system, examples illustrating
`some structures which might be found in a typical spread-
`sheet file t"on'nat.
`
`EJCAMPLE 2
`
`Spreadsheet files typically include rt series of records.
`Each record |)e-gins. with a 16-bit word indicating the type of
`record followed by a 16-bit word giving the number ofbytes
`
`1U
`
`EL}
`
`.10
`
`35
`
`4!]
`
`5!}
`
`111 an
`
`(it)
`
`{i5
`
`8
`in the body of the record. The file specification then
`describes the contents of each record. A spccillcation for a
`TABI..E SIZE record might be:
`
`RecType-=-Ox4{l'_’
`Rec Body Sizc=12
`Re-cord Data:
`
`Ctfliaet Name
`4
`Minnow
`is
`Maxliow
`-3
`Minuet
`Jt't
`Mnxtfoi
`13
`(Reserve-rl'}
`
`Size Cnntcntti
`2
`First defined row on the sheet
`2
`Last defmed row on the sheet, plus I
`2
`First defined column on the sheet
`2
`Last dctined cnlutnn on the sheet, plus 1
`4
`Reserved
`
`The re. can he more than one such record in the data. When
`there is more than one record,
`the values of the same
`component in each record are going to be similar to each
`other. Tl'tl.'~‘. enables the compression system to take advan-
`tage of the similarities to increase the compression ratio.
`The compression system is preferably based on the
`ARRAY_)O{ structures, including:
`
`r\R.R_AY_ [19
`
`ARRAY lt'.1:
`
`ARIUW _3.“.
`.»\RItAY__F33:
`AI-‘.RJ\‘i'_Fu4
`Alv‘.R.i\‘f__l<'ttiJ:
`AltR..?tY_STR.:
`
`for S-bit darn its-.1115, treated by the unending systern as
`being unsigned
`for ltivhir. data items, Lrtcatcd by the encoding system as
`being signed
`for .32-bit data items. treated by the encoding system as
`being signed
`for .’-2‘-hit llonls
`for ti-t-hit tlonts ("doubles")
`for 80-bit floats ("long duublr:s"::
`each entry includes several bytes whose count is given
`hy n size (string size}.
`
`For the 'l"ABlJ3. SIZE record, live arrays would be created
`{referring to the record specification). Each time a record is
`parsed. an entry is added into each of the live arrays.
`When all
`the records had heen parsed. a [u action is
`invoked to encode the items in each array. This function
`includes at large number of algorithms, which can exploit a
`wide variety of relationships to achieve maximum compres-
`sion. The irnptementation is extremely ulficicnt
`in that it
`requires virtually no overhead: an array with only one entry
`can cven be ctliciuntly compressed. This is necessary
`because the parsing system can transform a file into ti huge
`number of these small arrays. The encoding function can
`also free the data after it finishes encoding it.
`The decoding process reverses the sequence. First, each of
`the component arrays is decoded. Then the parsing is
`reversed.
`
`As this type of parsing has to he done extensively. a
`number of tools are used to simplify the process. They are
`based on the use of instructions called vstruct_defs. To
`continue with the TABLE SIZE example, the instruction set
`to parse such :1 record would be:
`
`If the lir.-at 3 instruction.-. always provide general info about the record:
`8,
`if nLII:t1 instructions in this set
`J2,
`U size of the record body, which is a fixed sat of IE bytes
`5.
`.-3' hum arrays to he created
`It the rcntaining iruztructious tell how to parse the record:
`l‘.’\“I't t'i_.
`.-'1' Parse it
`to hit integer [or the first component t“.\«'linRov.'“}
`INTII5.
`
`
`
`

`
`US 6,253,264 B1
`
`9
`
`-oonlinucd
`
`lNTlo,
`IN'T]b.
`INTJ-1 l:
`
`.-".t Parse a 3'2 bit integer for lhc. last component E“ Rescn‘ed"]
`
`Tht: vslrLIct_de1' instructions art: 16 bit integers. The first
`three words always include the same general info about the
`record. The rust of the words include instructions which tell
`the automated parsing system how to proceed when encoun-
`tering such a record.
`in this case. the instructions are this
`simplest possible—just a list of data types for each compo-
`nent of the record. The parser would create an array for each
`component, and load one entry of that type into the array
`each lime it parscs a rccord of that type;
`Functions are then provided to automate the parsing and
`encoding of records based on thesr: instructions. Again, the
`decoding process reverses the actions of the encoder.
`The vstrucl_ clcf instructions can hand]: situations

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket