throbber
United States Patent
`US 6,253,264 B1
`(10) Patent No.:
`(12)
`Sebastian
`(45) Date of Patent:
`*Jun. 26, 2001
`
`
`US006253264B1
`
`(75)
`
`73)
`
`Assi
`
`:
`
`ion
`
`Technologi
`
`(54) CODING NETWORK GROUPING DATA OF
`SAME DATA TYPE INTO BLOCKS USING
`FILE DATA STRUCTURE AND SELECTING
`COMPRESSION FOR INDIVIDUAL BLOCK
`BASE ON BLOCK DATA TYPE
`.
`ene
`:
`Inventor: William Sebastian, Falmouth, MA
`(US)
`Intelligent C
`Assignee Fakaoeth MA(US) EMMIS,
`(73)
`(*) Notice:
`This patent issued on a continued pros-
`ecution application filed under 37 CFR
`1.53(d), and is subject to the twenty year
`tent
`t
`isi
`f 35 US.C.
`"34a)(2).mm provisions ©
`‘
`Subjectto any disclaimer, the term ofthis
`patent is extended or adjusted under 35
`
`5,684,478 * 11/1997 Panaoussis ......ssessesseeeeees 341/51
`
`cece 395/785
`5,708,828 *
`1/1998 Coleman...
`1/1998 Gormish et al... 341/107
`5,710,562 *
`
`
`oeee : vit900 purser sesensentnsnennnnennsenee sOnor
`olmes ....
`.
`864,
`2/1999 Barbir oe eeeeereeeee 341/107
`5,867,114 *
`
`3/1999 Mochizuki et al... 707/102
`5,881,380 *
`secsssesssssseonen 707/104
`5,983,236 * 11/1999 Yager et al.
`
`5.991.515 * 11/1999 Fall et abe cccceccsccsssseesseeern 305/114
`FOREIGN PATENT DOCUMENTS
`(EP).
`(er)
`OTHER PUBLICATIONS
`
`8/1996
`
`0 729 237 A2
`
`Rice, R.F., et al., “VLSI Universal Noiseless Coder,” NTIS
`Tech Notes 2301, p. 234 (Mar. 1,1990).
`Diner, D.B., et al., “Competitive Parallel Processing for
`Compression of Data,” NTIS Tech Notes 2301, p. 379 (May
`
`* cited by examiner
`
`(21) Appl. No.: 09/036,052
`(22)
`Filed:
`Mar. 6, 1998
`animar,y bxaminernoms Lee
`Related U.S. Application Data
`
`
`
`
`60) filedonMar. 7Provisional application No. 60/036.548. ssistant Examiner—Tanh Nguyen
`
`
`(60) 1997 application
`No.
`60/036.548,
`(74) Attorney, Agent, or Firm—Hamilton, Brook, Smith &
`filed
`on
`Mar.
`7,
`Reynolds, P.C.
`(SL) Unt. C0 eeeeeeecccssssenneees G06F 17/30; GO6F 3/00;
`y
`GO6F 5/00; HO3M 7/00; H03M 7/30
`(57)
`ABSTRACT
`(52) U.S. C1. ceeeeeeeseeeseeennees 710/68; eesaay A preferred coding network uses an architecture called a
`:
`.
`Base-Filter-Resource (BFR) system. This approach inte-
`(58) Field of seaTIOSSOs,SokS08S08;eSa7 grates the advantages of format-specific compression into a
`?
`?
`?
`?
`410/65 68
`general-purpose compression tool serving a wide range of
`;
`data formats. Source data is parsed into blocks of similar
`References Cited
`data and each parsed blocks are compressed using a respec-
`tively selected compression algorithm. The algorithm can be
`U.S. PATENT DOCUMENTS
`chosen from a static model of the data or can be adaptive to
`the data in the parsed block. The parsed blocks are then
`oerosy . siyto0s on sereeeseneseseseseneeesescesseseeees ete combined into an encoded data file. For decoding,
`the
`:
`467,
`Us ieeecceeccssecesescenceeeecneecneeees
`
`4/1997 Albaneseetal......
`"395/200.13
`5,617,541 *
`Processis reversed.
`
`5,632,009 *
`5/1997 Raoetal.
`........
`.. 395/770
`5,638,498 *
`6/1997 Tyler et al. oo eeeceeeeseeeeeee 395/117
`
`(56)
`
`29 Claims, 8 Drawing Sheets
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
` N
`
`28
`ARRAY
`x
`
`<a
`2g
`‘
`32, STRINGS,
`Y
`ARRAY,7”|
`N
`STRING |}.
`cover
`|40
`FLOATS,
`ARRAY,
`
`4,
`

`
`
`CODER |50
`
`FLOAT
`
`
`
`INTEGER|,
`
`CODER [3g
`
`
`
`|S
`
`Commvault Ex. 1023
`Commvault v. Realtime
`
`US Patent No. 9,054,728
`
`Page1
`
`Page 1
`
`Commvault Ex. 1023
`Commvault v. Realtime
`US Patent No. 9,054,728
`
`

`

`U.S. Patent
`
`Jun. 26, 2001
`
`Sheet 1 of 8
`
`US 6,253,264 B1
`
`SENDING APPLICATION
`
`SOURCE DATA 2
`
`ENCODER
`
`ENCODED DATA 4
`
`DECODER
`
`3
`

`
`DECODED DATA 6
`
`RECEIVING APPLICATION f
`
`FIG. 1
`PRIOR ART
`
`Page 2
`
`Page 2
`
`

`

`U.S. Patent
`
`Jun.26, 2001
`
`Sheet 2 of 8
`
`US 6,253,264 Bl
`
`cVIVOsounos-t
`
`zz/|_MALS
`
`NOILOaTaS
`
`
`
`WALSASNOILOa1g$
`
`VIYSLIYS
`
`ZELOUXSEeSL
`
`8c
`
`G2
`
`QAqdOON3
`
`Vivd
`
`6¢
`
`SAVEYV
`
`Ge
`
`
`
`9¢°]qaZIWOLSNO
`
`AVY
`
`SWYHOASNVALL
`
`WOLSND
`
`LNANOdWOO
`
`SaINGOW
`
`OL
`
`ONIGOONA
`
`SYSLANVaVd
`
`81
`
`¢Old
`
`ve
`
`ONISHVd
`
`qasuvdvl
`
`SNOILONUYLSNI
`
`WALSASONISdVd
`
`dyatd
`
`Page 3
`
`Page 3
`
`
`
`
`
`
`

`

`U.S. Patent
`
`Jun. 26, 2001
`
`Sheet 3 of 8
`
`US 6,253,264 B1
`
`
`
`FIG. 3
`
`Page 4
`
`Page 4
`
`

`

`U.S. Patent
`
`Jun.26, 2001
`
`Sheet 4 of 8
`
`US 6,253,264 Bl
`
`Ov7
`
`OILVLS)
`YsdqdO0OAVeeVWOwds(JAGOW
`
`
`
`
`
`
`
`VIVdGONIYLSQSL0IdsddAOLSI]
`
`Lv
`
`
`
`YAHOLVWONIYLS
`
`SONIYLS
`
`dsSWldd
`
`8c
`
`
`
`
`
`
`
`Lv4S3dO0HOLVWWSONINLSGSHO.LVWNN
`
`
`
`YAqdO9AVaeevVOL
`
`
`
`Ysaqd09LxXal
`
`vOlas
`
`Page 5
`
`Page 5
`
`
`
`

`

`U.S. Patent
`
`Jun.26, 2001
`
`Sheet 5 of 8
`
`OG|Vivd
`
`1vO14
`
`LS
`
`Ysd0921
`
`
`
`92Ysaq0oAVaeVWOds
`
`SANTWAGSHOLVAINN
`
`
`
`YsaLYSANOOOLASVE
`
`
`
`STIVYsSLITSYysNOISHSANO9SLNANOdxad019SVSSILNVW019
`
`
`
`
`
`
`
`
`
`YAGOOONIYLSOLYAd0oAVaeeVOLYAdOoAVeavVOLYad0o0AvaavOL
`
`US 6,253,264 Bl
`
`
`
`
`
`GS‘Old
`
`OV828c82
`
`Page 6
`
`Page 6
`
`
`
`

`

`U.S. Patent
`
`Jun. 26, 2001
`
`Sheet 6 of 8
`
`US 6,253,264 B1
`
`SAMPLE DATA BASES
`
`PARSED FILES
`
`GENERATE PARSED FILES
`ANALYZE PARSED FILES
`GENERATE PREDICTOR DATA
`
`ANALYSIS RESULT FILES
`
`PDAT FILE
`
`FIG. 6
`
`Page 7
`
`Page 7
`
`

`

`U.S. Patent
`
`Jun.26, 2001
`
`Sheet 7 of 8
`
`US 6,253,264 B1
`
`ARRAY TO BE CODED
`
`No a
`
`ARRAY DATA
`
`iBUILD MODEL
`
`
`MODEL
`
`ARRAY DATA
`
` USE
`
`INITIAL
`
`SPLITTER
`
`NO
`
`71
`
`73
`
`ENCODED MODEL INFO
`
`SPLITTER
`
`YES
`
`TWO OR MORE ARRAYS
`PROCESS SEPARATELY
`
`19
`
`
`
`YES
`
`MATCHING
`ALGORITHMS
`
`81
`
`UNMATCHED
`VALUES
`
`ARRAYS CONTAINING
`MATCH DATA
`
`\’23
`
`CONVERTTO INT32
`
`
`
`MATCHING
`
`ALGORITHM
`
`85
`NO
`
`
`OVERSIZE ARRAY|
`HANDLER
`
`
`89
`
`NUMERIC
`ALGORITHMS
`
`TWO OR MORE ARRAYS
`PROCESS SEPARATELY
`
`YES
`
`REPEAT
`TRANSEORM
`
`99
`
`TO MID-LEVEL CODER
`
`100
`
`FIG. 7
`
`Page 8
`
`91
`
`95
`
`OFFSETS FROM
`MODEL
`
`SPLITTER
`
` USE
`
`
`NUMERIC
`SFORM
`
`<>
`;
`
`
`Page 8
`
`

`

`U.S. Patent
`
`Jun. 26, 2001
`
`Sheet 8 of 8
`
`US 6,253,264 B1
`
`101
`
`ARRAY DATA IN
`
`103
`
`RANGE
`
`REMAPPER
`
`
`EXTRA BITS
`TO PACK
`
`105
`
`
`
` RANGE CODES
`
`
`109
`
`MTF CODER
`
`NO
`
`RANGE CODES TO
`LOW LEVEL CODER
`
`
`
`FIG. 8
`
`Page 9
`
`Page 9
`
`

`

`US 6,253,264 B1
`
`1
`CODING NETWORK GROUPING DATA OF
`SAME DATA TYPE INTO BLOCKS USING
`FILE DATA STRUCTURE AND SELECTING
`COMPRESSION FOR INDIVIDUAL BLOCK
`BASE ON BLOCK DATA TYPE
`
`RELATED APPLICATIONS
`
`This application claims the benefit of U.S. Provisional
`Serial No. 60/036,548 filed Mar. 7, 1997, the teachings of
`which are incorporated herein by reference in their entirety.
`BACKGROUND OF THE INVENTION
`
`High performance data compression systems use models
`of the data to increase their ability to predict values, which
`in turn leads to greater compression. The best models can be
`achieved by building a compression system to support a
`specific data format. Instead of trying to deduce a crude
`model from the data within a specific file, a format-specific
`compression system can provide a precise pre-determined
`model. The model can take advantage not just of the file
`format structure, but also of statistical data gathered from
`sample databases.
`Previous efforts at format-specific compression have been
`focused on solutions to a few individual formats rather than
`
`on the development of a generalized method that could be
`adapted to many formats. The models which have been
`created typically involve a small number of components.
`This works adequately when mostof the data is included in
`a few components, such as an imagefile having mostly red,
`blue, and green pixel values. But may formats are best
`modeled using a large number of components, and the
`previous systems are not designed to build or encode such
`models.
`
`SUMMARYOF THE INVENTION
`
`A preferred coding system in accordance with the inven-
`tion solves both of these problemsin the priorart: it provides
`a generalized solution which can be adapted to handle a wide
`range of formats, and it effectively handles models that use
`large numbers of components. The system involves new
`approachesat many levels: from the highest(interface) level
`to the lowest (core encoding algorithms) level. The coding
`network includes a compression network for encoding data
`and a decompression network for decoding data.
`At the highest level, a preferred compression system in
`accordance with the invention uses an architecture called a
`Base-Filter-Resource (BFR) system. This approach inte-
`grates the advantages of format-specific compression into a
`general-purpose compression tool serving a wide range of
`data formats. The system includesfilters which each support
`a specific data format, such as for Excel XLS worksheets or
`Word DOC files. The base includes the system control
`modules and a library used byall the filters. The resources
`include routines which are used by more than onefilter, but
`which are not part of the base. If a filter is installed which
`matches the format of the data to be encoded, the advantages
`of format-specific compression can be realized for that data.
`Otherwise, a “generic” filter is used which achieves perfor-
`mance similar to other non-specific data compression sys-
`tems (such as PKZip, Stacker, etc.).
`Atthe next level, the system preferably includes a method
`for parsing source data into individual components. The
`basic approach,called “structure flipping,” provides a key to
`converting format
`information into compression models.
`Structure flipping reorganizes the information ina file so that
`similar components that are normally separated are grouped
`together.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`Upon this foundation are a numbersof tools, such as:
`a language to simplify the creation of parsing routines;
`tools to parse the source data using this method into
`separate components; and
`tools to generate models for individual components by
`automated analysis of sample data bases.
`These tools can be applied to the filters for a wide range of
`file and data types.
`The compression system preferably includes tools called
`customized array transforms for specific filters and for
`certain types of filters. These techniques handle a specific
`file type or certain data constructions used by several file
`types, such as encoding two dimensional arrays of data as
`found in many database formats.
`At the low-level of the system are preferably a number of
`mechanisms for encoding data arrays, including:
`new low-level encoding algorithms;
`methods for integrating a large numberof transforms and
`encoding algorithms;
`methods for eliminating overhead so that small data
`blocks can be efficiently coded; and
`a new method for
`integrating both static models
`(determined from statistical analysis of sample
`databases) and dynamic models (adapted to the data
`within a particular array) into the encoding of each
`component.
`A preferred method of encoding source data comprising
`parsing the source data into a plurality of blocks. The parsed
`blocks typically have a different format than the source data
`format. In particular, similar data from the source data are
`collected and grouped into respective blocks.
`For each block, a compression algorithm is selected from
`a plurality of candidate compression algorithms and applied
`to the block. The compression algorithms can be determined
`based on the amount of data in the respective block.
`Furthermore, the compression algorithm can be adapted to
`the respective block,
`including the use of a customized
`transform. The selection of an algorithm can also be based
`on a compression model, which is derived from the format
`of the source data. The compressed data from the plurality
`of blocks are then combined into encoded data.
`
`The coding network can also include a decompression
`network to convert the encoded data back into the source
`data. First
`the data is decoded and then the parsing is
`reversed. In a lossless system, the resulting data is identical
`to the source data.
`
`Embodiments of the invention preferably take the form of
`machine executable instructions embedded in a machine-
`
`readable format on a CD-ROM,floppy disk or hard disk, or
`another machine-readable distribution medium. These
`instructions are executed by one or moreprocessing units to
`implement the compression and decompression networks.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The foregoing and other objects, features and advantages
`of the invention will be apparent from the following more
`particular description of preferred embodiments of the
`invention, as illustrated in the accompanying drawings in
`which like reference characters refer to the same parts
`throughout the different views. The drawings are not nec-
`essarily to scale, emphasis instead being placed upon illus-
`trating the principles of the invention.
`FIG. 1 is a high-level block diagram of a typical data
`compression system.
`FIG. 2 is a block diagram of a preferred encoder using the
`BFR system in accordance with the invention.
`
`Page 10
`
`Page 10
`
`

`

`US 6,253,264 B1
`
`3
`FIG. 3 is a flow chart of an array coder.
`FIG. 4 is a block diagram ofthe string coder 40 of FIG.
`
`3
`
`FIG. 5 is a block diagram ofthe float coder 50 of FIG. 3.
`FIG. 6 is a flowchart of a preferred automated analysis
`system.
`FIG. 7 is a flowchart overviewing the high-level of the
`integer coder.
`FIG. 8 is a flowchart overviewing the mid-level of the
`integer coder.
`
`DETAILED DESCRIPTIONS OF PREFERRED
`EMBODIMENTS OF THE INVENTION
`
`4
`invention. The encoder 3' is based on the use of a plurality
`of filters 10a,..., 10x, ... , 10z which serve specific file
`formats. For example, one filter 10a might support several
`versions of the DBF database format, while anotherfilter 10z
`might support several versions of the DOC format used by
`the Microsoft Word software program. The individualfilters
`provide respective selection criteria 12 to a filter selection
`system 22.
`The filter selection system 22 receives the source data 2
`and checksthe selection criteria 12a,...,12x,..., 12z of
`all filters 10a, ..., 10x, ..., 10z installed in the system to
`see if any of them support the source data’s format. If not,
`a “generic” filter is used which provides compression per-
`formance similar to other generic compression systems,
`such as Lempel-Ziv (LZ) engines. In a particular preferred
`FIG. 1 is a high-level block diagram of a typical data
`embodimentof the invention, the generic compression sys-
`compression system. The primary components of the system
`tem employs an SZIP engine as described by Mr. Schindler
`are a sending application 1, an encoder 3, a decoder 5, and
`in US. application Ser. No. 08/970,220 filed Nov. 14, 1997,
`a receiving application 7.
`the teachings of which are incorporated herein by reference
`The sending application 1 provides source data 2 to be
`in their entirety. The descriptions of the network will pri-
`encoded by the encoder 3. The sending application can be a
`marily cover the situations in whichafilter to support the
`file archiver, a communication system, or any other appli-
`data format is successfully found.
`cation with a need to represent data using fewer bytes than
`in the native format.
`Briefly, a parsing system 24 parses the source data 2 into
`a large numberof parsed Arrays 25, which each includeall
`The encoder 3 receives the source data 2 and uses data
`the instances of a particular component in the source file.
`The term “Array”, described in further detail below,
`is
`capitalized to indicate that this refers to a type of structure
`specific to the network, as opposed to the usual use of the
`term “array”. The algorithm used to parse the source data 2
`is called “Structure Flipping”, an important aspect of the
`network, described in further detail below.
`The BFR system uses an automated analysis system (also
`described in detail below) to build a model for each Array
`25, which will be used when the Arrays 25 are encoded in
`an array coder 28. In somecases, better compression can be
`achieved if customized array transforms 26 are first used.
`While the models generated by the automated analysis
`system process each component Array 25 independently, the
`customized routines can take advantage of inter-component
`relationships.
`The array coder 28 encodes the Arrays 25 using a count-
`dependent modeling system, which includes a mixture of
`static models (constant for all databases) and dynamic
`adjustment to each particular Array 25.
`Thefilter 10x has components to support each of the four
`sections of the encoder 3'. The selection criteria 12x, deter-
`mined bythefile format specification, includes information
`neededto recognizefiles served by the filter 10x such as byte
`values at the beginning ofthefile orfile title suffices. Parsing
`instructions 14, also determined by the file format
`specification, include the information needed to parse the
`source file into parsed Arrays 25 havingall the instances of
`a particular component
`in the source file. The custom
`component models 16 are constructed by referencing both
`the file format specification and sample databases. They can
`take advantage of inter-component relationships to trans-
`form someof the component Arrays 25 to make them more
`compressible. Encoding parameters 18 are generated by an
`automated analysis system which examines component
`Arrays 25 parsed from a large number of sample databases.
`The encoding parameters 18 provide the data necessary to
`support the count-dependent modeling system in the array
`coder 28.
`FILTER SELECTION SYSTEM
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`compression algorithms to represent the data using fewer
`bytes. The typical implementation of the encoder 3 is as
`software running on a general purpose computer, although
`the algorithms to be described may also be implemented on
`specialized hardware.
`The output of the encoderis encoded data 4, which can be
`stored in memory (in which case the encoding allows more
`source data to be stored within the same hardware), or
`transmitted to the decoder 5 (in which case the encoding the
`source data to be transmitted faster when the transmission
`channel bandwidth is limited).
`The decoder5 retrieves or receives the encoded data 4 and
`reverses the algorithms used to encode the data so as to
`reconstruct the source data as decoded data 6. In a lossless
`
`system, the decoded data 6 is identical to the source data.
`Depending on the application, however, some data loss may
`be acceptable. As with the encoder 3,
`the decoder 5 is
`typically embodied as software running on a general purpose
`computer, but can be implemented in specialized hardware.
`The receiving application 7 receives the decoded data 6
`for processing.
`A preferred compression engine includes filter-based
`encoders and decoders which can be utilized by a large
`numberof sending applications 1 and receiving applications
`7. While the applications 1,7 that store/retrieve or transmit/
`receive the encoded data are not part of the compression
`system, they may adjust encoding and decoding operations
`according to system conditions such as available transmis-
`sion channel bandwidth.
`
`The descriptions to follow will mostly cover the encoder.
`The decoderjust reverses the encoder’s actions, so a descrip-
`tion of the encoder is sufficient
`to describe the overall
`
`encoding/decoding system. The term “File” will be used to
`describe a block of source data, which matches the conven-
`tional usage, but which should also be understood to be
`applicable to many other types of data blocks such as a data
`exchange in a Client-Server application.
`
`ENCODER
`
`FIG. 2 is a block diagram of a preferred encoder using a
`Base-Filter-Resource (BFR) network in accordance with the
`
`A preferred compression system uses an expandable
`architecture that allows users to support new or updated
`
`Page 11
`
`Page 11
`
`

`

`US 6,253,264 B1
`
`5
`formats by adding orreplacing filters 10. As noted above, a
`filter 10 can support more than one format. For example, a
`filter called “LOTUS123v4” might support all file formats
`associated with the Lotus 123 program, such as the current
`WK4format and the earlier WK3 and FM3 formats. If a
`
`Filter was later developed for a new Lotus WKS5 format, a
`user replaces the old filter with one supporting the new
`format and which would also be backwardly compatible
`with the earlier filter with respect to the older formats.
`In the Microsoft Windows environment, each Filter is a
`separate dynamically linked library (DLL) file. In addition
`to data tables, a filter includes executable code.
`If a filter is found fora file, then it is dynamically linked
`and then called to encode the file. The encoded data stream
`
`identify files via byte values near the beginning ofthefile.
`The identification procedure uses a search tree method to
`reduce the time needed to check the list when manyfilters
`are installed. From this registry, the appropriate filter 10 is
`invoked for the file type.
`To handle compound file formats such as Microsoft’s
`OLE2, one filter can invoke another filter to encode a
`sub-file or stream within the compoundfile. For example, a
`Winword Filter might call an Excel Filter to encode some
`Excel data which was embedded within a compound docu-
`mentfile.
`PARSING SYSTEM
`
`As noted above, the parsing system 24 creates a large
`numberof parsed Arrays 25. Each parsed Array 25 includes
`all instances of a particular componentin a file. A separate
`component
`is created for each member of each defined
`structure type.
`
`EXAMPLE1
`
`For example, consider the following description of a record
`defined in a file format description.
`Cell Value:
`
`The format uses a special method of coding numbersin an
`internal number type, called an RK number, to save
`memory and disk space.
`Record Data:
`
`Rec
`
`Row
`
`1
`2
`3
`4
`5
`
`0x0001
`0x0001
`0x0001
`0x0001
`0x0001
`
`Col
`
`0x0002
`0x0003
`0x0004
`0x0006
`0x0007
`
`Index
`
`0x0055
`Ox005f
`0x0053
`Ox005f
`Ox005f
`
`Num Val
`
`Ox3ff1e100
`0x3ff1a300
`0x3ff1d000
`Ox3ff1c400
`0x3ff1d300
`
`10
`
`15
`
`20
`
`30
`
`35
`
`40
`
`45
`
`50
`
`The source data would appear almost random to a byte-
`matching algorithm (shown in hex):
`01 00 02 00 55 00 00 el fi 3f01 00 03 00 5f 00 00 a3 fl
`3f 01 00 04 00 53 00 00 dO fi 3f
`01 00 06 00 5f 00 00 c4 el fl 3f 01 00 07 00 5f 00 00 d3
`fl 3f
`includes an identification (ID) code indicating which filter
`But when handled as four separate Arrays 25 (Row, Col,
`was used to encode the data, and the decoder has to have a
`Index, and Num) using algorithms optimized for each
`corresponding decodingfilter.
`component,
`the data can compress significantly. This
`the base
`To implement the filter selection system 22,
`approachis referred to herein as “structure flipping” because
`module maintains a registry of the currently installed files,
`of the way it rearranges the data.
`which includes the data need to identify source files of that
`To encode together the instances of the same component
`type as well as the path to the DLL. This registry is updated
`in a database is a common compression technique, and the
`
`each timeafilter is added or replaced. The identification data file format is often used to identify such components. But in
`can include bothalist offile title suffices and a method to
`25
`the priorart, this approach has been limitedto isolating a few
`major components of a specific file format, and the file
`format description used only as a technique offinding these
`major components. In accordance with preferred embodi-
`ments of the invention, a file format is actually converted
`into a compression model, which is then interfaced into an
`automated system for analyzing the data and optimizing the
`coding performance of each component. While this com-
`pression model is usually enhanced by the customized array
`transforms 26, it is still an aspect of the overall system.
`The extent
`to which preferred approaches differ from
`prior art systemsis reflected in the new tools constructed in
`order to implementit. This approach creates a large number
`of Arrays 25. Each Array 25 requires a different compression
`model, and the number of entries in each can change
`significantly from file to file. In order to work effectively in
`such a case, preferred embodimentsof the invention employ
`new approachesin how the low-level coding is handled such
`as the elimination of overhead, count-dependent processing,
`and the use of a processing network instead of fixed coding
`algorithms. These techniques, described above, are preferred
`in order to handle the output of the parsing system 24.
`All of these implementation features are integrated at a
`high level, so that high-level calls can handle all of the
`operations needed to initialize the system, parse the data,
`and encode the Arrays.
`In accordance with a particular preferred embodiment of
`the invention, the file parsing system 24 uses three sub-
`systems:
`
`Offset
`
`Name
`
`Size
`
`Contents
`
`4
`6
`8
`
`10
`
`rw
`col
`fmt
`
`num
`
`2
`2
`2
`
`4
`
`Row number
`Column number
`Index to the record that includes
`the cell format
`the coded number
`
`The parsing system 24 creates parsed Arrays 25 for each
`of the four components of this record. Each time an RK
`record is found, an entry is added to each array. The parsed
`Arrays 25 are much more compressible than the raw source
`data, as shown bydata from five records:
`
`55
`
`60
`
`65
`
`FILE_BUFFER
`ARRAYS
`PVSTRUCT
`
`Input\Output Interface
`To store the data for a single component
`To parse the input into Arrays or to write the decoded
`Arrays into the Output stream
`
`A plurality of FILE_BUFFERroutines provide a flexible
`input/output interface using FILE__BUFFERstructures, so
`the sameset of parsing routines can be used regardlessof the
`input/output formats. This allows the compression system to
`be independent of the I/O format. The file parsing system
`preferably supports two file formats: DOS and OLE2, but
`this can expanded to other file formats as well as to inter-
`
`Page 12
`
`Page 12
`
`

`

`US 6,253,264 B1
`
`7
`faces to communication ports and other I/O options. For
`DOSfiles, the logical stream positions are usually the same
`as the raw file positions. The OLE2 format requires an
`extensive layer of code to follow an OLE2 stream within the
`containerfile.
`
`8
`in the body of the record. The file specification then
`describes the contents of each record. A specification for a
`TABLE SIZE record might be:
`
`RecType=0x402
`Rec Body Size=12
`Record Data:
`
`Offset Name
`
`Size Contents
`
`4.
`6
`8
`10
`12
`
`MinRow
` MaxRow
` MinCol
` MaxCol
`(Reserved)
`
`First defined row on the sheet
`2
`Last defined row on the sheet, plus 1
`2
`First defined column on the sheet
`2
`Last defined column onthe sheet, plus 1
`2
`4 Reserved
`
`10
`
`15
`
`20
`
`A plurality of ARRAY routines provide a system for
`managing the process of initializing, adding data to,
`encoding, and freeing arrays of data. Each ARRAYstructure
`includes one array of data and the information needed to
`manage these processes, such as the numberof entries in the
`Array 25, the size of the buffer currently allocated for the
`array, and information which will be useful in encoding the
`array.
`In particular, ARRAYXX structure stores all the data
`from a component, and includes mechanismsto supportfast
`sequential read/write access to the data, an expandable
`buffer for the data, and storage of model data. This structure
`There can be more than one such record in the data. When
`also allows a simple set of tools to handle all allocation,
`there is more than one record,
`the values of the same
`parsing, and coding functions. The _XX nomenclature
`component in each record are going to be similar to each
`indicates that slightly different ARRAY structures are
`other. This enables the compression system to take advan-
`defined for different data types, such as different integer
`tage of the similarities to increase the compressionratio.
`wordsizes, strings, and floating point types, although many
`The compression system is preferably based on the
`of the processing routines work on all of the variations.
`ARRAY_XX structures, including:
`A plurality of PVSTRUCT routines make up an auto-
`25
`mated system for parsing the file data. During encoding,
`they read data from the FILE_BUFFERstructure and write
`data into Arrays 25. During decoding, they take data from
`for 8-bit data items, treated by the encoding system as
`being unsigned
`the Arrays 25 and write it into the FILEBUFFERstruc-
`for 16-bit data items, treated by the encoding system as
`tures. These routines can take a high-level description of a
`being signed
`file format and handle all the tasks associated with managing
`for 32-bit data items, treated by the encoding system as
`the parsing as well as the encoding of the parsed data. Many
`being signed
`for 32-bit floats
`ARRAY_F32:
`of the Filters 10 will also include customized parsing
`for 64-bit floats (“doubles”)
`ARRAY_F64
`routines which call the FILE.BUFFER and ARRAYrou-
`for 80-bit floats (“long doubles”)
`ARRAY_F80:
`tines directly instead of using the PVSTRUCTroutines. But
`ARRAY_STR:_each entry includes several bytes whose countis given
`35
`an aspect of the parsing system 24 is the PVSTRUCT
`by a size (string size).
`routines, and this discussion will be concerned primarily
`with their operation.
`A PVSTRUCTstructure includes all information needed
`
`ARRAY_U8:
`
`30
`
`ARRAY_16:
`
`ARRAY_32:
`
`For the TABLE SIZErecord, five arrays would be created
`(referring to the record specification). Each time a record is
`to parse and code a data structure defined in a file format.
`parsed, an entry is added into each ofthe five arrays.
`This usually involves multiple components, so it provides
`When all
`the records had been parsed, a function is
`and handles individual Arrays 25 for each component. The
`invoked to encode the items in each array. This function
`term “pvstruct” comes from Predictor Variable Structure,
`includes a large number of algorithms, which can exploit a
`where Predictor refers to the use of statistical data gained
`wide variety of relationships to achieve maximum compres-
`from analyzing a large numberof samplefiles to assist in the
`sion. The implementation is extremely efficient in that it
`encoding of the data, and Variable indicates that it supports
`requires virtually no overhead: an array with only one entry
`structures where the component counts can vary at run time
`can even be efficiently compressed. This is necessary
`for each instance (called “dynamic structures” in some
`because the parsing system can transformafile into a huge
`textbooks).
`number of these small arrays. The encoding function can
`A parsing language is a defined instruction set which
`also free the data after it finishes encodingit.
`provides a concise way to specify the parsing of a
`The decoding process reverses the sequence. First, each of
`PVSTRUCTstructure. The instruction set includes support
`the component arrays is decoded. Then the parsing is
`for a wide range of dynamic variation, including where the
`reversed.
`counts of components are dependent on file counts of other
`As this type of parsing has to be done extensively, a
`components as well as on external factors. It incorporates
`numberof tools are used to simplify the process. They are
`other file format data, such as defined values or range
`based on the use of instructions called vstruct_defs. To
`restrictions. It also provides a means of exchanging data
`continue with the TABLE SIZE example, the instruction set
`between the parser andthefilter.
`to parse such a record would be:
`The compression system is based on parsingafile so as
`to place items together that are similar. In the following
`description of the compression system, examples illustrating
`some structures which might be found in a typical spread-
`sheet file format.
`
`40
`
`45
`
`50
`
`55
`
`60
`
`EXAMPLE 2
`
`Spreadsheet files typically include a series of records.
`Each record begins with a 16-bit word indicating the type of
`record followed by a 16-bit word giving the numberof bytes
`
`65
`
`// the first 3 instructions always provide general info about the record:
`8
`// num instructions in this set
`12,
`// size of the record body, which is a fixed sz of 12 bytes
`5,
`// num arrays to be created
`// the remaining instructions tell how to parse the record:
`INT16,
`// Parse a 16 bit integer for the first component (“MinRow’)
`INT16,
`
`Page 13
`
`Page 13
`
`

`

`US 6,253,264 B1
`
`9
`
`-continued
`
`10
`
`-continued
`#define FLOAT32
`#define FLOAT64
`#define FLOAT80
`#define STRING
`#define STRINGZ
`#define SYS_S1
`#define SYS_SZ1
`#define SYS_S2
`#define SYS_SZ2
`
`BAINMBW
`
`9
`Oxa
`Oxb
`
`INT16,
`INT16,
`INT32 };
`
` // Parse a 32 bit integer for the last component (“Reserved”)
`
`The vstruct_def instructions are 16 bit integers. The first
`three words always include the same general info about the
`record. The rest of the words include instructions which tell
`
`the automated parsing system how to proceed when encoun-
`tering such a record. In this case, the instructions are the
`simplest possible—just a list of data types for each compo-
`nent of the record. The parser would create an array for each
`component, and load one entry of that type into the array
`each time it parses a record of that type:
`Functions are then provided to automate the parsing and
`encoding of records based on these instructions. Again, the
`decoding process reverses the actions of the encoder.
`The vstruct_def instructions can handle situations much
`more complex than the previous example. For convenience,
`a summary of the construction of these instructions will be
`summarized next. The upper byte of the 16 bit instruction is
`an argumentas required by the lower b

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket