throbber
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`12 July 2001 (12.07.2001)
`
`
`
`PCT
`
`(10) International Publication Number
`WO 01/50325 A2
`
`(51) International Patent Classification7:
`
`G06F 17/00
`
`(21) International Application Number:
`
`PCT/US00/35672
`
`(22) International Filing Date:
`29 December 2000 (29.12.2000)
`
`(81) Designated States (national): AE, AG, AL, AM, AT, AU,
`AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CR, CU, CZ,
`DE, DK, DM, DZ, EE, ES, FI, GB, GD, GE, GH, GM, HR,
`HU, ID, IL, IN, IS, JP, KE, KG, KP, KR, KZ, LC, LK, LR,
`LS, LT, LU, LV, MA, MD, MG, MK, MN, MW, MX, MZ,
`NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM,
`TR, TT, TZ, UA, UG, UZ, VN, YU, ZA, ZW.
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(30) Priority Data:
`60/174,305
`
`3 January 2000 (03.01.2000)
`
`US
`
`(71) Applicant: EFECKTA TECHNOLOGIES CORPO-
`RATION [US/US]; 1370 Bob Adams Drive, Steamboat
`Springs, CA 80487 (US).
`
`(72) Inventors: AVERY, Caleb; —. TOBELMANN, Ralph; —.
`
`(84) Designated States (regional): ARIPO patent (GH, GM,
`KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZW), Eurasian
`patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European
`patent (AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE,
`IT, LU, MC, NL, PT, SE, TR), OAPI patent (BF, BJ, CF,
`CG, CI, CM, GA, GN, GW, ML, MR, NE, SN, TD, TG).
`
`Published:
`
`Without international search report and to be republished
`upon receipt of that report.
`
`(74) Agents: SACHS, Robert et al.; Fenwick & West LLP, Two
`Palo Alto Square, Palo Alto, CA 94306 (US).
`
`For two-letter codes and other abbreviations, refer to the "Guid-
`ance Notes on Codes andAbbreviations ” appearing at the begin-
`ning ofeach regular issue ofthe PCT Gazette.
`
`(54) Title: EFFICIENT AND LOSSLESS CONVERSION FOR TRANSMISSION OR STORAGE OF DATA
`
`102
`
` Binary Data
`
`Segment Data
`
`Select Transform For
`Segment
`
`104
`
`106
`
`WO01/50325A2
`
`108
`
`110
`
`Package Each
`Transform
`
`
`
`Store/Transmit
`
`(57) Abstract: A system and method for lossless data compression. A mathematical transform equivalent to the content value of
`the data, and taking fewer bits to represent, is found.
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 1
`
`NetApp; Rackspace Exhibit 1016 Page 1
`
`

`

`
`
`WO 01/50325
`
`PCT/US00/35672
`
`EFFICIENT AND LOSSLESS CONVERSION FOR TRANSMISSION OR
`
`STORAGE OF DATA
`
`This application claims priority under 35 U.S.C. §119(e)
`
`from US. Patent
`
`Application serial No. 60/174,305, filed January 3, 2000, which is incorporated by reference
`
`in its entirety.
`
`BACKGROUND
`
`Field of the Invention
`
`This invention relates to data transformation, more particularly, to lossless data
`
`compression.
`
`Background of the Invention
`
`Existing compression technology focuses on finding and removing redundancy in
`
`the input binary data. Early compression approaches focused on the format of the data.
`
`These format approaches utilize run—length encoding (RLE) and variations of frequency
`
`mapping methods. These pattern-coding approaches work well for ASCII character data,
`
`but never reached the compression potential for other data formats.
`
`Advances in compression technology evolved from information theory, particularly
`
`Claude Shannon’s work on information entropy. The bulk of this work is statistical in
`
`nature.
`
`Shannon-Fano and Huffman encoding build probability trees of symbols in
`
`descending order of their occurrence in the source data, allowing the generation of “good”
`
`variable—size codes.
`
`This is often referred to as entropy coding.
`
`Compression is
`
`accomplished because more frequently occurring binary patterns are assigned shorter codes,
`
`allowing for a reduction in the overall average of bits required for a message.
`
`Shannon-Fano and Huffman encoding are optimal only when the probability of a
`
`pattern’s occurrence is a negative power of 2. These methods engendered a number of
`
`adaptive versions that optimize the probability trees as the data varies.
`
`10
`
`15
`
`20
`
`25
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 2
`
`NetApp; Rackspace Exhibit 1016 Page 2
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`Arithmetic Coding overcame the negative power of 2 probabilities problem by
`
`assigning one (normally long) code to the entire data. This method reads the data, symbol by
`
`symbol, and appends bits to the output code each time more patterns are recognized.
`
`The need for more efficiency in text encoding led to the development and evolution of
`
`dictionary encoding, typified by LZ family of algorithms developed by J. Ziv and A. Lempel.
`
`These methods spawned numerous variations.
`
`In these methods, strings of symbols (a
`
`dictionary) are built up as they are encountered, and then coded as tokens. Output is then a
`
`mix of an index and raw data.
`
`10
`
`15
`
`As with entropy coding, dictionary methods can be static, or adaptive. Variants of the
`
`LZ family make use of different techniques to optimize the dictionary and its index. These
`
`techniques include:' search buffers, look-ahead buffers, history buffers, sliding windows, hash
`
`tables, pointers, and circular queues. These techniques serve to reduce the bloat of seldom-
`
`used dictionary entries. The popularity of these methods is due to their simplicity, speed,
`
`reasonable compression rates, and low memory requirements.
`
`Different types of information tend to create specific binary patterns. Redundancy or
`
`entropy compression methods are directly dependent upon symbolic data, and the inherent
`
`patterns that can be recognized, mapped, and reduced. As a result, different methods must be
`
`optimized for different types of information. The compression is as efficient as the method of
`
`modeling the underlying data. However, there are limits to the structures that can be mapped
`
`20
`
`and reduced.
`
`The redundancy-based methodologies are limited in application and/or performance.
`
`In general, entropy coding either compromises speed or compression when addressing the
`limited redundancy that can be efficiently removed. Typically, these methods have very 1c; (
`
`compression gain. The primary advantage is that entropy coding can be implemented to
`
`25
`
`remain lossless.
`
`Lossy compression can often be applied to diffuse data such as data representing
`
`speech, audio,
`
`image, and video. Lossy compression implies that the data cannot be
`
`reconstructed exactly. Certain applications can afford to lose data during compression and
`
`reconstitution because of the limitations of human auditory and visual systems in interpreting
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 3
`
`NetApp; Rackspace Exhibit 1016 Page 3
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`the information. Perceptual coding techniques are used to exploit these limitations of the
`
`human eyes and ears. A perceptual coding model followed by entropy encoding, which uses
`
`one of the previously discussed techniques, produces effective compression. However, a
`
`unique model (and entropy coder) is needed for each type of data because the requirements
`
`are so different. Further, the lossy nature of such compression techniques mean the results
`
`lose some fidelity, at times noticable, from the original, and make them unsuitable for many
`
`purposes.
`
`Thus, a method for compression that is both lossless and capable of high compression
`
`gain is needed.
`
`10
`
`15
`
`20
`
`25
`
`SUMMARY OF THE INVENTION
`
`The present invention compresses binary data. The data is split into segments. Each
`
`of these segments has a numerical value. A transform, along with state information for that
`
`transform, is selected for each segment. The numerical value of the transform with its state
`
`information is equal to the numerical value of the segment. The transform, state information
`
`and packet overhead are packaged into a transform packet. The bit—length of the transform
`
`packet is compared to the bit-length of a segment packet that includes the raw segment and
`
`any necessary packet overhead. The packet with the smaller bit-length is chosen and stored
`
`or transmitted. After reception of the packets, or retrieval of the packets from storage, the
`
`numerical value of each segment is recalculated from the transform and state information, if
`
`necessary. The segments are recombined to reconstitute the original binary data.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Fig. 1 is a flow chart showing an overview of the compression process.
`
`Fig. 2 is a flow chart showing an overview of the recovery process.
`
`Fig. 3 is a block diagram of the compression system and illustrates the general flow of
`
`data through the compression system.
`
`Fig. 4 is a flow chart of the pre-processor.
`
`Fig. 5 is a flow chart of the transform engine.
`
`Fig. 6 is a flow chart of a transform process.
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 4
`
`NetApp; Rackspace Exhibit 1016 Page 4
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`Fig. 7 illustrates finding a set of favorable state information without testing every
`
`possible set of state information.
`
`Fig. 8 illustrates partial solutions Within a segment.
`
`Detailed Description of the Preferred Embodiments
`
`Fig. 1 is a flow chart showmg an overview of the compression process 100. The
`
`initial input 102 is binary data. Any piece of binary data is simply a number expressed in
`
`binary form. Thus, any piece of binary data has a numerical value. This numerical value is
`
`the data’s “content value.”
`
`The input binary data 102 is split 104 into segments, if necessary. The input binary
`
`data may be short enough that it is not further split. Each segment has a content value. The
`
`content value of each segment is identified and a transform along with appropriate state
`
`information is selected and tested 106 for each segment. A general transform is capable of
`
`representing many values. The state information provides the information necessary to
`
`specify an exact value for the transform. The term “state information” includes any variables,
`
`coefficients, remainders, or any other information necessary to set the specific numerical
`
`value for the transform. In some embodiments, “packet overhea ,” is added to the transform
`
`and state information. The packet overhead includes any information beyond the transform
`
`and state information needed to allow later recalculation of the original segment and
`
`reconstitution of the original input binary data.
`
`The transform With its state information has the identical numerical value as the
`
`corresponding segment. The following equation represents the transform concept:
`
`M=T(State Information),
`
`where M is the content value of the segment, and T is the transform. The transform is an
`
`arithmetic transform, a logical transform, or another mathematical transform.
`
`The transform with its state information and packet overhead has a representation
`
`efficiency gain (“REG”). The REG is a measurement of the efficiency of the transform, state
`
`information, and any packet overhead. The REG is defined as the ratio of LogzM/ LogZT,
`
`where LogzM is the number of binary bits required to represent M, and Long is the number of
`
`binary bits required to decodably represent the transform, state information, and packet
`
`10
`
`15
`
`20
`
`25
`
`30
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 5
`
`NetApp; Rackspace Exhibit 1016 Page 5
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`overhead. Thus, if the REG value is greater than one, the transform plus state information
`
`and packet overhead takes up fewer bits than the segment.
`
`For example, a 9-bit size message is capable of representing the transform ab: , where
`
`each of a, b, and c is a 3-bit message. In this case, abc is the transform, and a, b, and c are
`
`each a variable of the transform. The values of a, b, and c make up the state information.
`
`Both the transform and the state information are necessary to make an expression whose
`
`numerical value is equal to the content value of the segment.
`
`With the a”c example above, nine bits are capable of representing an integer over
`
`700,000 decimal digits long, equivalent to more than 2.3 million binary bits. As an example,
`
`10
`
`this transform can represent the content value of 150,094,635,296,999,121, or 362 using only
`
`nine bits. This number would occupy fifty—eight bits if conventionally represented in binary
`
`form. Thus, by using this transform, the 9-bit message transmits 58 bits of content value.
`
`The 58-bit data segment has been greatly compressed.
`
`Each transform and accompanying state information are next packaged 108 into a
`
`packet. The packet includes additional packet overhead that provides any other information
`
`necessary to allow later unpackaging and recovery of the content value of the packet. This
`
`packet overhead can include information that identifies the segment, identifies the transform,
`
`or any other necessary information. The packet typically takes fewer bits to express than the
`
`original segment. Thus, the segment has been compressed.
`
`Each packet is then stored or transmitted 110. Because the packet is typically smaller
`
`than the segment, storing the packet takes up less space than the segment, and transmitting
`
`the packet takes less time than transmitting the segment.
`
`Fig. 2 is a flow chart showing an overview of the recovery process 200, in which the
`
`original data is recovered from packets. The packet is received 202 if it has been transmitted,
`
`or retrieved fi‘om storage if it has been stored. Next, the packet coding is decoded 204 to
`
`determine the identity of the transform and how to use the state information to recover the
`
`original segment. Using the decoded information, the original segment’s content value is
`
`recalculated 206. Finally, all recomputed segments are put together again in their original
`
`order to reconstitute 208 the original input binary data. Thus, the output 210 of the recovery
`
`process 200 is identical to the input 102 of the compression process.
`
`15
`
`20
`
`25
`
`30
`
`5
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 6
`
`NetApp; Rackspace Exhibit 1016 Page 6
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`System Overview
`
`Fig. 3 is a block diagram of the compression system 300 and illustrates the general
`
`’ flow of data through the compression system 300. The compression system 300 performs the
`
`processes of Figs. 1 and 2. The flow through the compression system 300 starts with the
`
`input of a binary string 302 that is to be compressed.
`
`The control program 306 serves as the process controller for all compression system
`
`300 functions. The control program 306 monitors other compression system 300 components
`
`and tracks all information related to the input binary string throughout processing. The
`
`control program 306 also interacts with and terminates any process according to time, trial, or
`
`10
`
`other parameters.
`
`The pre-processor 304 takes the input binary string 302 and splits the binary string
`
`302 into segments. The length of each segment is appropriate to a given data type, transforms
`
`likely to be used, processor capacity, and application parameters. In some situations, the pre—
`
`processor 304 also mutates the segments in order to give more favorable variations of a
`
`15
`
`segment to the transform engine 308.
`
`The transform engine 308 receives the segments from the pre-processor 304 and
`
`computes permutations of transforms and state information, each permutation equivalent to
`
`the content value of the segment. The size of each permutation, including associated packet
`
`overhead, is analyzed, and the most efficient combination of transform, state information, and
`
`packet overhead is selected. If the permutations show no data compression, the segment is
`
`sent unmodified to the packager 310 as raw data.
`
`The packager 310 receives the output of the transform engine 308. For each segment,
`
`the packager receives from the control program 306 all relevant information, including state
`
`information, as to how the segment was transformed. From the transform, state information,
`
`and other associated information, the packager 310 creates a packet. Thus, the packet
`
`includes the transform, state information, and packet overhead information that together
`
`allows later decoding of the packet.
`
`20
`
`25
`
`At this point, the input binary string 302 has been segmented, transformed, and
`
`packaged into packets. Together, the packets are smaller than the original input binary string
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 7
`
`NetApp; Rackspace Exhibit 1016 Page 7
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`302. These packets can be either stored or transmitted through established storage and
`
`transmission protocols 312. Because the packets are smaller than the original input binary
`
`string, they have the advantage of taking less storage space and being transmitted more
`
`quickly than the original binary data.
`
`The unpackager 314 receives the packet if the packet has been transmitted, or retrieves
`
`the packet from storage if the packet has been stored. The unpackager 314 interprets the
`
`packet overhead instructions, unpacks the packet components and associated information for
`
`use by the decoder 316, and unpacks any unmodified data as raw data segments.
`
`The decoder 316 receives the unpackaged information from the unpackager 314 and
`
`recalculates the content value of the original segment. The decoder 316 applies the state
`
`information to the appropriate transform(s) to recalculate the content value of the segment. If
`the data was mutated, the process is reversed for that given segment to recover the original
`
`format of the segment.
`
`The reconstitutor 318 receives the segments from the decoder 316 and concatenates
`
`the segments in sequence to reconstitute the original binary string. Based on application
`
`parameters, this can be done to an entire file or streamed as needed by the application. The
`
`reconstitutor 318 then outputs the binary string 320. The binary string 320 output by the
`
`reconstitutor 318 is identical to the input binary string. Thus, the compression system 300
`
`provides lossless compression.
`
`Control Program
`
`10
`
`15
`
`20
`
`The control program 306 tracks data related to each data segment and transformation
`
`of the segment, as well as information regarding the pre—processor 304, the transform engine
`
`308, and all other compression system 300 functions. This information can include, but is not
`
`25
`
`limited to:
`
`l. The identification of each segment, including information providing the
`
`location (of the segment within the original input binary string
`
`2. The size of each segment
`
`3. The data type of each segment
`
`30
`
`4. Computational power of the transform engine 308
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 8
`
`NetApp; Rackspace Exhibit 1016 Page 8
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`Computational power of the decoder 316
`
`Application requirements, such as time constraints for real-time streaming
`
`applications
`
`Data type requirements
`
`Whether the segment has been mutated, and the technique used to mutate the
`
`segment
`
`Time spent on the segment by the pre-processor 304
`
`Computational cycles spent on the segment by the pre-processor 304
`
`Time spent on the segment by the transform engine 308
`
`Computational cycles spent on the segment by the transform engine 308
`
`The identity of the transform or transforms used for the segment
`
`Any Combinatorial Optimization Heuristics used to select state information
`
`variables
`
`Tracking information related to any heuristic used
`
`Successful transform(s)
`
`State information of successful transform(s)
`
`Bit—length of transform reference(s) and associated state information for
`
`successful transform(s)
`
`Bit—length of associated packet overhead for successful transform(s)
`
`Partial solution transform(s)
`
`State information of partial solutions (including offsets)
`
`Bit—length of transform references and associated state information for partial
`
`solutions
`
`Bit-length ofpacket overhead for transforms used by partial solutions
`
`Finite State Machine(s) used
`
`Finite State Machine information, such as reference data and Finite State
`
`Machine tree data
`
`de Bruijn Sequence starting point and log of index
`
`3-D Graph Tree tracking information
`
`N—space curve data
`
`9.
`
`10.
`
`ll.
`
`12.
`
`13.
`
`14.
`
`15.
`
`16.
`
`17.
`
`18.
`
`19.
`
`20.
`
`21.
`
`22.
`
`23.
`
`24.
`
`25.
`
`26.
`
`27.
`
`28.
`
`29.
`
`“BOTS” used
`
`NetApp; Rackspace
`
`Exhibit 1016
`
`Page 9
`
`10
`
`15
`
`20
`
`25
`
`30
`
`NetApp; Rackspace Exhibit 1016 Page 9
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`30. BOTS information, such as content value locations of BOTS and deltas from
`
`BOTS to segment content value
`
`The process monitor (not shown) is a major subpart of the control program 306. The
`
`process monitor monitors the pre—processor 304 and the transform engine 308. The process
`
`monitor has parameters that constrain the operations of the pre—processor 304 and transform
`
`engine 308. These constraint parameters can include a target REG, the time spent processing,
`
`the number of computational cycles spent processing, or other parameters. A target REG is a
`
`REG value that is sufficiently high so that the compression system 300 stops searching for
`
`higher REG transforms when a combination of transform, state information, and packet
`
`overhead with the target REG is found. The constraint parameters for time spent processing
`
`and computational cycles spent processing ensure that the compression system 300 does not
`
`spend an indefinite amount of time and resources searching for the best combination of
`
`transform and state information. The parameters are preset or changed by the control
`
`program 306. The parameters are changed if the data type changes, if the computational
`
`resources change, if the application changes, or by human or other external intervention.
`
`If the pre—processor 304 or the transform engine 308 exceeds a constraint parameter,
`
`the process monitor signals the control program 306 to terminate pre—processor 304 or
`
`transform engine 308 operations. When pre—processor 304 operations are terminated, the
`
`binary string is split into segments. The length of the segments are the best approximation of
`
`optimum length thus far determined by the pre-processor 304. When transform engine 308
`
`operations are terminated, the transform engine 308 will output the current best transform and
`
`state information, or, if no transforms compress the data, send the segment as raw data.
`
`Pre—Processor
`
`Fig. 4 is a flow chart of the pre~processor 304. First, the pre—processor 304 analyzes
`
`400 the data type of the binary string 302.
`
`In analyzing the data type, the pre—processor 304
`
`attempts to identify or characterize the data type of the binary string 302. The data type is
`
`used to help determine which segment size, which transform, and which state information for
`
`the transform, which will be used to test and transform the binary data. The pre—processor
`
`304 characterizes the binary string 302 by either knowing which application generated the
`
`binary string 302 or by analyzing the data itself. One method for analyzing the data to
`
`determine the data type is comparing the binary string 302 to an existing database of data
`
`10
`
`15
`
`20
`
`25
`
`30
`
`NetApp;Rackspace
`
`Exhibit1016
`
`Page10
`
`NetApp; Rackspace Exhibit 1016 Page 10
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`types. One method for generating such a database is by sampling information generated by
`
`known digital applications.
`
`If the pre-processor 304 characterizes the binary string 302, the control program 306
`
`stores the data type associated with that string. If the pro-processor 304 cannot characterize
`
`the binary string 302, the control program 306 keeps a record that the binary string’s 302 data
`
`type is unknown.
`
`The pre-processor 304 next determines 402 whether the size of the input binary string
`
`302 is greater than a minimum optimum processing block size. The minimum optimum
`
`processing size is different for different data types. The control program 306 stores each
`
`known minimum optimum processing size associated with the corresponding data type.
`
`If
`
`the pre—processor 304 has characterized the data type and the minimum optimum processing
`
`size for that data type is known, the pro-processor 304 simply compares the input binary
`
`string 302 to the stored minimum optimum processing size.
`
`If the data type is unknown, or an optimum size for a data type is unknown, the binary
`
`string 302 is split 404 into segments of a size that has worked well previously with many
`
`types of data. Alternatively, the binary string 302 is initially split 404 into segments of
`
`differing bit—lengths. As these segments of differing bit-lengths are processed by the
`
`compression system 300, the control program 306 keeps a record of which segment sizes are
`
`most easily processed. By trying different sizes, and tracking which sizes work best, the
`
`control program 306 builds a record of an optimum processing size for each data type, so that
`
`the optimum processing size for that type of data becomes known. If the control program 306
`
`develops a record of an optimum processing size for a data type, that segment size can be
`
`incorporated for future use.
`
`If the input binary string 302 is larger than the minimum optimum processing block
`
`size, the binary string 302 is split 404 into segments. The bit-length of the segments is
`
`determined by several factors.
`
`The computational power of the transform engine 308 affects the bit-length of the
`
`segments. If the transform engine 308 has little computational power available, the segments
`
`must be shorter than if the transform engine 308 has a great deal of computational power.
`
`Similarly, if the computational power of the decoder 316 is lmown, it will influence the bit-
`
`length of the segments. The more computational power the decoder 316 has, the greater the
`
`bit-length of the segment can be.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`10
`
`NetApp;Rackspace
`
`Exhibit1016
`
`Page11
`
`NetApp; Rackspace Exhibit 1016 Page 11
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`a“
`i/The application using the coinpression system 300 also affects the bit-lengths of the
`
`segments. For instance, applications such as video conferencing, which require human-
`
`perceptible real-time encoding, require shorter segments than those of off—line archiving
`
`where larger segments can be transformed and encoded over longer timeframes and more
`
`processing cycles.
`
`The data type also affects the bit-lengths of the segments. For instance, audio data
`
`may lend itself to different segment sizes than ASCII data. Another factor related to data type
`
`is what transforms will likely be used with the data type. The transforms used with a
`
`particular data type may perform better with a specific segment length. The control program
`
`306 monitors the transform engine 308 and stores which transforms work well with specific
`
`data types, which segments sizes work well with those transforms, and which segment sizes
`
`can be incorporated for future use.
`
`Thus, when the pre-processor 304 successfully
`
`characterizes a data type, the control program 306 is capable of retrieving information as to
`
`the optimum bit-length of the segment.
`
`If the input binary string 302 is equal to or smaller than the minimum optimum
`
`processing block size, the binary string 302 is treated as a single segment. Each segment,
`
`whether it is a portion of the split binary string 302, or the entire binary string, is treated the
`
`same throughout the rest of the compression system 300.
`
`The segment size is output 406 to the control program 306. The control program 306
`
`stores the segment size and uses it to compare to the size of possible combinations of
`
`transform, state information and packet overhead.
`
`The control program 306 assigns to each segment specific identification information
`
`so that the original binary data may be reconstituted later, and tracks each of the segments,
`
`along with the identification information, associated data type, settings for the process
`
`monitor, and any other associated data.
`
`Each of the segments may next be mutated 408. The segment is mutated 408 if the
`
`control program 306 has stored information indicating that previously, mutated segments of
`
`that data type and segment size resulted in successful transforms, or that unmutated segments
`
`of that data type and segment size did not result in successful transforms.
`
`If the control
`
`program 306 has no information stored that indicate that a mutated segment would result in
`
`successfiil transforms, the segment is not mutated. However, the segment may subsequently
`
`10
`
`15
`
`20
`
`25
`
`30
`
`‘11
`
`NetApp;Rackspace
`
`Exhibit1016
`
`Page12
`
`NetApp; Rackspace Exhibit 1016 Page 12
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`be returned 412 from the transform engine 308 to be mutated if a successfill transform for the
`
`unmutated segment is not found.
`
`The pre—processor 304 mutates 408 segments by creating different versions of the
`
`segment to provide the transform engine 308 With more permutations of the same segment.
`
`By providing more permutations of the segment,
`
`the likelihood of finding an efficient
`
`transform is increased. Some segments for which the transform engine 308 failed to generate
`
`an efficient transform 412 are also returned to the pre-processor 304 and mutated 408.
`
`By monitoring and storing information on which data type and segment sizes are
`
`returned to the pre-processor 304 to be mutated 408,
`
`the control program 306 gains
`
`information on which data types and segment sizes should be initially mutated. Further, by
`
`monitoring mutated segments through the compression system 300, the control program 306
`
`gains information on which mutations result in successful transforms. This information can
`
`be re—incorporated into the control program.
`
`There are several ways to mutate the segment. If the control program 306 has a record
`
`stored of which mutating technique has resulted in successful transforms, that mutating
`
`technique is used.
`
`If the control program 306 does not have such a record, the mutating
`
`technique is chosen by stepping through a stored library ofpossible mutations.
`
`In a first mutating technique, the bit—length of the segment is adjusted. This involves
`
`splitting a group of segments differently. By doing so, the pre—processor 306 sends different
`
`content values to the transform engine 308 to process. Another technique is “cutting and
`
`shuffling” the data. A predefined shuffle library is used. The index of the shuffle is included
`
`as part of the packet overhead. Yet another technique is finding the complement of the
`
`segment and sending the complement to the transform engine 308. In another technique, the
`
`bits within the segment are shifted or rotated. When the bits are shifted or rotated, the
`
`direction and number of bits to rotate or shift is sent as part of the packet overhead. The
`
`segment can be modified by an arithmetic (scaling) or logical (XORing) operation.
`
`In an
`
`alternate technique, other traditional compressions are used to change the content value of the
`
`segment.
`
`The control program 306 tracks mutated segments, and how the'mutated segments
`
`were mutated. This information is provided to the packager 310 so that the packager 310 can
`
`add the associated appropriate packet overhead codes. These codes identify the data as
`
`mutated data, as well as how the data has been mutated.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`12
`
`NetApp;Rackspace
`
`Exhibit1016
`
`Page13
`
`NetApp; Rackspace Exhibit 1016 Page 13
`
`

`

`WO 01/50325
`
`PCT/US00/35672
`
`The segments, mutated or not, are next output 410 to the transform engine 308.
`
`Transform Engine
`
`Fig. 5 is a flow chart of the transform engine 308. The segment output by the pre-
`
`processor 304 is input 410 to the transform engine 308. The segment is sent to as many
`
`transform processes 502 as are available and as computational resources allow. The segment
`
`is processed by the transform processes 502 either serially or in parallel.
`
`Each transform process 502 provides another transform (or transforms) available to
`
`represent the content value of the segment. The transform engine 308 has multiple transform
`
`processes 502 available to it since a single transform may not be appropriate to the content
`
`value of every input segment. This is seen from studying the example, discussed previously,
`
`of the 9-bit size transform abc , where each of a, b, and c is a 3-bit message. Using this
`
`transform, the content value of an integer over 700,000 decimal digits long can be transmitted
`
`using only nine bits. Such a number would occupy over 2.3 million bits if conventionally
`
`represented. While this transform provides the capability for nine bits to transmit a 2.3
`
`million-bit number, the nine bits still are only capable of conveying less than 512 different
`
`messages, corresponding to 512 different content values. Therefore, there exist a significant
`
`number of values between 0 and 777 that the transform ab” cannot represent.
`
`Normally, arithmetic transforms include

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket