throbber

`
`Veritas Techs. LLC
`Exhibit 1026
`Page 070
`
`

`

`
`
`ttachment 1f: Copy of Hsu from the
`niversity of Illinois
`at Urbana-Cham ai n Librar
`
`
`
`Veritas Techs. LLc
`Exhibit 1026
`Page 071
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 071
`
`

`

`ttachment 1f: Copy of Hsu from the
`niversity of Illinois
`
`at Urbana-Champaign Library
`
`
`
`VHIUME 35 NH ?
`
`JULV1995
`
`PRACTICE & EXPERIENCE
`
`ISSN 0038-06-14
`
`EDITORS
`
`DOUGLAS COIVIER
`
`ANDY WELLINGS
`
`VLF-Y
`
`Chichester New York Brisbane Toronto Smgapore
`
`A Wiley—Interscience Publication
`SPEXBL 25-7) 7057830I1995¥
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 072
`
`

`

`VOLUME 25, No.8
`
`AUGUST 1995
`
`l WA
`PRACTICE & EXPERIENCE
`
`ISSN 0038—0644
`
`EDITORS
`
`DOUGLAS COMER
`
`ANDY WELLINGS
`
`'
`.
`Chi-chester ~ New York - Bvi‘sbane: Toronto SungaPO
`A Wiley—Interscience Publication
`
`re
`
`SPEXBL 25(8) 831—946 (1995)
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 073
`
`

`

`PRACTICE & EXPERIENCE
`
`VOLUME 25, N04 9
`
`SEPTEMBER 1995
`
`SFTWARE
`
`ANDY WELLINGS
`
`EDITORS
`
`DOUGLAS COMER
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 074
`
`

`

`PRACTICE & EXPERIENCE
`
`VOLUME 25, No 10
`
`OCTOBER 1995
`
`I WAR
`
`'SSN 003s ~06“
`
`EDITORS
`
`DOUGLAS COMER
`
`ANDY WELLINGS
`
`mm WP “r
`CCCCCCCCerl “1"” V0“: Brisbane Toronto Smgap
`A wueV-l nnnnncience Publication
`PEXBL 35f1011065—1182t1995r
`
`ore
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 075
`
`

`

`SOFTi/WiRE
`
`PRACTICE 8t EXPERIENCE
`
`' West
`‘
`Editors
`Professor D. E. Comer, Computer Science Department, Purdue UHIVGI‘SIIVI
`Lafayette, W 47907, U.S.A.
`Charlene l. Tubis, Us. Editorial Assistant, Computer Scronce Department. Piirrltic University. W051 Lafavefie-
`W 47907, U.S.A.
`Dr A. J. Wellings, Department of Computer Science, University of YOFk,
`Heslington, York YOl 5DD
`
`Advisory Editorial Board
`ProtessorD. W. BARRON
`'
`'
`Department of Electronics and Computer Scrence,
`gniversity of Southampton,
`outhampton $09 SNH, U.K.
`ProtessorP. J. BROWN
`Computing Laboratory, The University,
`Canterbury, Kent CT2 TNF, U.l<.
`
`Professor D E KNUTH
`Dr: artmcril of Computer Sun-rum, Striril‘orrl UIIIVUFSIIV.
`Stamford, Chhlrmm 9&03 U 5 A
`D" 8 WV LAMPSON
`180 Lake Vlew AW.
`Cambridge.
`MA 02138. U S A
`
`D, C A LANG
`Three Space Ltd.
`70 Castle Street,
`Cambridge C83 OAJ, U K
`
`Proteseor B. RAN DE L L
`Computing Laboratory,
`University at Newcastle upon Tying,
`Claremont Tower, Clarernonl Road.
`Newcastlerupoanyne NE? 7RU, U K
`
`Avenue. Elmont. NM 11003. U.S.A,
`
`ProfessorJ.A.CAMPBELL
`Department of Computer Science University College London
`,
`GowerStreet, London WC1E GBT, UK
`Professor F,J. COREATO
`Electrical Engineering Department,
`Massachusetts Institute of Technology,
`545 TeChnoio V Square,
`Cambridge.
`assachusetts 02139 U.S.A.
`’
`Dr. Christopherw, FRASER
`ATM Bell Laboratories, 600 Mountain Ave 2046‘.
`Murray Hill, NJ 07974-0635, USA.
`P
`rafessor PER BRINCH HANSEN
`3mm“ °i Computer and Information Science
`L116 CST, Syracuse University.
`Syracuse,NewYork13210,U.S.A.
`meessor D' R' HANSON
`v
`.
`6.
`Departmental Corn uter Scienc
`Princeton University? Princeton,
`New Jersey 08544, U.S.A.
`ProfessotJ. KATZENELSON
`igfl'w of Electrica|_En9ineerin .
`Haifanllggleslrael Institute of Tecfinology,
`Dr. B.w. KERNiGHAN
`rat
`'
`AT&T Bell Lang
`Murray Hill, New Jigs;
`d Scope
`cred and rigorously refereed vehicle for the dissemination and
`.
`.
`.
`t
`ftware-Practic
`discussion ofpracggiifggzgsggafiha:éwiwggggagiirshizpzoftware for both systems and applications. Contributions regu—
`lariyrlal describe detailed accounts of com | red software-system projects which can serve as 'how-lo-donit' models for future
`work in thesame field; (bi presem shun rep ins on programming techniques that can be used in a wide variety of areas: (c)
`document new techniques and tools that gig in solving software construction problems; and to) explain methods/techniques
`that cog? with the Specia| demands of large some software projects. The journal also features timely Short Communications
`0" rflpl
`y develo in
`-
`The editors elongate 22:32:: 3 am which resui, from practical experience with tools and methods developed and used
`in both academic and industrial efivfionmems The aim is to encourage practitioners to share their experiences with design,
`implementation and evaluation of techniques and tools for software and software systems.
`>
`‘
`-
`Papers Cover software design and implementation case studies describing the evolution of system and the thinking behind
`them,and critical appraisals ofsoftware systems. The journal has always welcomed tutorial articles describing welletried tech-
`niques not previously documented in computing literature, The Emphasis is on practical esperience; articles With theoretical
`ormathematlcai Contentare included only in cases where an understanding of the theory Will lead to better practical Systems.
`Articles range in length from a Shun Communication (half to two pages) to the length required to give full treatment to a
`substantial piece‘of software (40 or more paQESi-
`Advertising: F0! details contact.
`Michael J. Levermore, Advertisement 53:55, Jul-in leay 3. Sons Ltd, Baf‘l‘ins Lane, Chichester, Sussex P019 1UD. England (Telephone 01243
`71035:. Fax 01243 775878. Telex 85290]
`Software—Practice andExpariencellSSN 0038-06441USPS 890-920) is published monthly. by John Wiley 8i Sons Limited, Baffins Lane, Chichester.
`Sussex. England. Second class postage paid 31 Jamaica, N.Y, 1143i. Air freight and mailing in the U.S.A. by Publications Expediting Services Inc.,
`200 MeachamAvenua, Elmoni, my 11903I©1995 by John Wiley & Sons Ltd. Printed and bound In Great Britain by Page Bros. Norwich. Printed
`on acid-free paper.
`To subwibemrdsrs should be addressed to Subscriptions Department, John Wiley & Sons Lirmted. Baffins Lane. Chichester Sussex. P019 1UD,
`England. 1995 subscription price (13 issues}: US. $825.00.
`U.S.A. POSTMASTEH: Send address chanas to Software—Practice and Experience, c/o Publications Expediring Services Inc. 200 Meanham
`
`Professor .J. S. ROHL
`ar‘tment of Computer Screnco,
`Qfig University of Western Australia,
`.
`Nedlands, Western Australia 6009.
`D T ROSS
`Softech Inc., 460 Tolten Pond Road.
`,Massachusetts 02155. U.S.A.
`Waltharn
`B H. SHEARING
`The Schware Factory.
`28 Fadbrook, Limpstield. Oxted,
`surrey RHB ODW' UK
`Professor N. WlRTH
`.
`.
`institut flit Computersysteme. ETH-Zentrum,
`CH-8092 Zurich, Swrtzerland.
`
`‘
`
`Avenue'
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 076
`
`

`

`‘ttachment 1f: Copy of Hsu from the L
`niversity of Illinois
`at Urbana—Chamoaion Librar
`
`
`
`SOFTWARE—PRACTICE AND EXPERIENCE
`(Softw. pract. exp.)
`
`VOLUME 25, ISSUE No. 10
`
`October 1995
`
`CONTENTS
`
`Research Alert us» and SCISEARCH Database usn.
`
`Indexed or abstracted by Cambridge Scientific Abstracts, CompuMath Citation Index IISI),
`Compuscience Database, Computer Contents, Computer Literature Index, Computing
`Reviews, Current Contents/Eng, Tech 8: Applied Sciences, Data Processing Digest, Deadline
`Newsletter, Educational Technology Abstracts, Engineering Index, Engineering Societies
`Library, |BZ (International Bibliography of Periodical Literature), Information Science Abstracts
`(Plenum),
`INSPEC, Knowledge Engineering Review. Nat Centre for Software Technology.
`
`Migration in Object-oriented Database Systems—A Practical Approach:
`C. Huemer, G. Kappel and S. Vieweg ....................................................... .. 1065
`
`Automatic Synthesis of Compression Techniques for Heterogeneous
`Files: W. H. Hsu and A. E. Zwarico ........................................................... .. 1097
`
`A Tool for Visualizing the Execution of Interactions on a Loosely-coupled
`Distributed System: P. Ashton and J. Penny ........................................... .. 1117
`
`Process Scheduling and UNIX Semaphores: N. Dunstan and l. Fris ........ .. 1141
`
`Software Maintenance: An Approach to Impact Analysis of Objects
`Change: 8. Ajila .......................................................................................... .. 1155
`
`
`
`SPEXBL 25(10) 1065—1182 (1995)
`ISSN 0038-0644
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 077
`
`

`

`1096
`
`c. HUEMER, G. KAPPEI. AND s. mama“
`
`.
`
`for dccumrul pmdncunn mum"
`‘The Lcitsland—a new :00]
`20. A. Scheer and A. Hars.
`Fandel and G. Zépfe] (eds), Modern Production Concepts. Springer. “CHI”. 1""!- Pl“ 37" '8‘ 'd
`21. G. Kappel and S. Vieweg, “Database requirements for (TIM npphcutmm'. Juunm! 0f I’m-"m"
`Manufacturing Systems, 5, (4/5), 48—63 (1994).
`_
`,l.
`22. U. Schreier, ‘Databasc requirements of knowledge-huscd pruduclmn scheduling. and L()l;ll|l'([).13
`CIM perspective', in R. Agrawal (cd.). Proc. 19m International (‘nnfi-n-m‘e- 1m h-rv Luau
`ad
`Basas, Dublin, August 1993. pp. 710—711.
`'_ gummy
`23. R. Cattel] and J. Skeen, ‘Object operations benchmark'. A (1” Truruurmun rm Humhan .
`_
`.
`_
`l7, (1), 1—31 (1992).
`.
`- M- Carey. D. Dawn: and J. Naughton. The 007 hunchmurk'. I’rm' A('M SHIMUI) (rmf- 10'”
`
`SIGMOD Record, 22, (2), 12—21 (1993).
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 078
`
`

`

`Automatic Synthesis of Compression Techniques for
`Heterogeneous Files
`
`Department of Computer Science. University of Illinois at Urbano-Champaign. Urbana, IL 61801. USA.
`(email: bhsu@cs.uiuc.edu, voice: (217) 244-1620)
`
`WILLIAM H. HSU
`
`AND
`
`AMY E. ZWARICO
`
`Department of Computer Science, The Johns Hopkins University. Baltimore MD 21218, U-S-A-
`(email: amy@cs.jhu.edu. voice: (410) 516-5304)
`
`SUMMARY
`
`We present a compression technique for heterogeneous files, those files which contain multiple types_0f
`data such as text, images, binary, alldio, 0r animation. The system uses statistical methods to determme
`the best algorithm to use in compressing each block of data in a file (possibly a different algorithm for
`each block). The file is then compressed by applying the appropriate algorithm to each block. We obtain
`better savings than POSSible by using a single algorithm for compressing the file. The implementatlon
`of a working version of this heterogeneous compressor is described, along with examples 0f its val?“
`toward improving COmPTBSSion both in theoretical and applied contexts. We compare our results With
`those obtained using four commercially available compression programs, PKZIP, Unix COMPIeSS’ Smfin’
`and Compact Pro, and Show that Our system provides better space savings.
`
`KEY WORDS:
`
`adaptive/selective data compression algorithms; redundancy metrics; heterogeneous files: program synthesis
`
`INTRODUCTION
`
`
`
`
`
`ttachment 1f: Copy of Hsu from the
`niversity of Illinois
`at Urbana—Champaign Library
`
`SOFTWARE—PRACTICE AND EXPERIENCE. V01. 2500). 10974116 (OCTOBER 1995)
`
`CCC 0038—0644/95/101097—20
`©1995 by John Wiley & Sons, Ltd.
`
`Received 20 April 1994
`
`Revised 5 February 1995
`
`The primary mOt'IVfiltiOn in StUdying compression is the savings in space that it provldes-
`Many compression algorithms have been implemented, and with the advent of new hard-
`ware standards, more techniques are under development. Historically, research in data com-
`pression has been devoted to the development of algorithms that exploit various types 0f
`redundancy found in a file. The shortcoming of such algorithms is that they assume. 0ften
`inaccurately, that files are homogeneous throughout. Consequently, each exploits only a
`subset of the redundancy found in the file.
`,
`Unfortunately, no algorithm is effective in compressing all files.1 For example, Glyn???”
`Huffman coding works best on data files with a high variance in the frequency of 1nd1v1d-
`ual characters (inCIUding some graphics and audio data), achieves mediocre performance on
`natural language text files, and performs poorly in general on high—redundancy binary data.
`On the other hand, run length encoding works well on high-redundancy binary data, blit
`performs very poorly on text files. Textual substitution works best when multiple-character
`strings tend to be repeated, as in English text, but this performance degrades as the average
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 079
`
`

`

`Attachment 1f: Copy of Hsu from the
`niversity of Illinois
`at Urbana—Champaign Library
`
`1098
`
`w. H. HSU AND A. E. ZWARICO
`
`length of these strings decreases. These relative strengths and Weaknesses become critical
`when attempting to compress heterogeneous files. Heterogeneous files are those which con-
`tain multiple types of data such as text, images, binary, audio. or animation. Consequently.
`their constituent parts may have different degrees of compressibility. Because most coni-
`pression algorithms are either tailored to a few specific classes of data or are designed to
`handle a single type of data at a time, they are not suited to the compression of heteroge—
`neous files. In attempting to apply a single method to such files. they forfeit the possibility
`of greater savings achievable by compressing various segments of the tile with different
`methods.
`
`This denotes. in our system, a file with a low rate of repeated strings.
`
`To overcome this inherent Weakness found in compression algorithms, we have developed
`a heterogeneous compressor that automatically chooses the best compression algorithm to
`use on a given variable-length block of a file, based on-both the qualitative and quantita—
`tiVe properties of that segment. The compressor determines and then applies the selected
`algOIitth to the blocks separately. Assembling compressmn procedures to create a spoof—
`iCfllly tailored program for each file gives improved pcrformancc over using one program
`for all files. This system produces better compressmn resuits than four commonly available
`compression packages, PKZIP’z Unix compress,3 Smfflt, and Compact Pro5 for arbitrary
`heterogeneous files.
`The major contributions of this work are twofold. The first is an improved compression
`SYStem for heterogeneOus files. The second is the development of a method of statisti-
`cal analysis of the compressibility of a file (its TCdunda'lC)’ types}. Although the concept
`of redundancy types is not new,"'7 synthesis of compressmn techniques using redundancy
`measurements is largely unprecedented. The approflCh Presemccl In this paper uses a straight-
`forward program synthesis technique: a compresswn plan, conSisting of instructions for each
`block of input data, is generated, guided by the statistical properties of the input data. Be-
`cause of its use of algorithms specifically suited to the types of redundancy exhibited by
`the Particular input file, the system achieves COHSlSlenl iWCTflgC performance throughout the
`file, as shown by experimental evidence.
`_
`As an exampie of the type of Savings our system produces, consider compressmg a
`hEierOgeneous file (such as a small multimedia data file) consisting of 10K of low redun-
`daincy (non-natural language) ASCII data, 10K of English text, and 25K of graphics. In
`this case, a reasonably sophisticated compression program might recognize the increased
`savings achievable by employing Huffman compression, to better take advantage of the fact
`that the majority of the data is graphical However, none of the general-purpose compres—
`SlOl’l methods under consideration are optimal when used alone on this file. This is because
`the text part of this file is best compressed by textual substitution methods (e.g., LempEl—
`21") rather than statistical methods, while the low-redundancy data" and graphics parts
`are best compressed by alphabetic distribution-based methods (e.g., arithmetic or dynamic
`Huffman coding) rather than Lempel—Ziv or run-length encoding. This particular file totals
`45K in length before compression_ A compressor using pure dynamic Huffman coding only
`achieves about 7 per Cent SEWingS for a Compressed file of length 42.2K. One of the best
`general~purposc LemPcl—Ziv compressors currently availablegi" achieves 18 per cent sav-
`ings, producing a compressed me of length 37.4K. Our system uses arithmetic coding on
`the first and last segments and Lempel—Ziv compression on the text segment in the middle,
`achieving a 22 per cent savings and producing a compressed file of length 35.6K. This is
`a 4 per Cent improvement over the best commercial System-
`~
`'
`.
`-
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 080
`
`

`

`‘ttachment 1f: Copy of Hsu from the
`niversity of Illinois
`at Urbana—Chamoai-n Librar
`
`AUTOMATIC SYNTHESIS OF COMPRESSION TECHNIQUES FOR HE’I'EROGENEOUS FILES 1099
`
`The purpose of our experiments was to verify the conjecture that a selective system
`for combining methods can improve savings on a significant range of heterogeneous files,
`especially multimedia data. This combination differs from current adaptive methods in
`that
`it switches among compression paradigms designed to remove very different types
`of redundancy. By contrast, existing adaptive compression programs are sensitive only to
`changes in particular types of redundancy, such as run-length, which do not require changing
`the underlying algorithm used in compression. Thus they cannot adapt to changes in the
`type of redundancy present, such as from high run-length to high character repetition. The
`superiority of our approach is demonstrated in our experimental section.
`This paper begins with a presentation of existing approaches to data compression, includ-
`ing a discussion of hybrid and adaptive compression algorithms and a description of four
`popular commercial compression packages. These are followed by documentation on the
`design of the heterogeneous compression system, analysis of experimental results obtained
`from test runs of the completed system, and comparison of the system’s performance against
`that of commercial systems. Finally, implications of the results and possibilities for future
`work are presented.
`
`a preset threshold. Another adaptive variant of this algorithm is the Lempel—Ziv—Welch
`
`RELATED WORK
`
`It is a fundamental result of information theory that there is no single algorithm that Per'
`forms optimally in compressing all files.‘ However, much work has been done to develop
`algorithms and techniques that work nearly optimally on all classes of files. In this sec-
`tion we discuss adaptive algorithms, composite algorithms, and four popular commerciaI
`compression packages.
`
`Adaptive compression algorithms and composite techniques
`
`Exploiting the heterogeneity in a file has been addressed in two ways: the development
`of adaptive compression algorithms, and the composition of various algorithms. Adflelve
`compression algorithms attune themselves gradually to changes in the redundancies Within a
`file by modifying parameters used by the algorithm, such as the dictionary, during execution.
`For example, adaptive alphabetic distribution-based algorithms such as dynamic Huffman
`codinglo maintain a tree structure to minimize the encoded length of the most frequently
`occurring characters. This property can be made to change continuously as a file 18 pro-
`cessed.
`
`An example of an adaptive textual substitution algorithm is Lempel—Ziv compressron,
`a title which refers to two distinct variants of a basic textual substitution scheme. The
`first variant, known as LZ77 or the sliding dictionary or sliding window method, selects
`positional references from a constant range of preceding strings." ” These ‘back-POlnters
`literally encode position and length of a repeated string. The second variant, known ‘as
`LZ78 or the dynamic dictionary method, uses a dictionary structure with a paging heuristlc.
`When the dictionary — a table of strings from the file — is completely filled, the contents
`are not discarded. Instead, an auxiliary dictionary is created and updated while compressmn
`continues using the main dictionary. Each time this auxiliary table is filled, its contents are
`‘swapped’ into the main dictionary and it is cleared. The maintenance of dictionaries for
`textual substitution is analogous to the semi-space method of garbage collection, in which
`two pages are used but only one is ‘active’ — these are exchanged when one fills beyond
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 081
`
`

`

`Attachment 1f: Copy of Hsu from the
`niversity of Illinois
`at Urbana—Champaign Library
`
`1100
`
`W. H. HSU AND A. E. ZWARICO
`
`(LZW) algorithm, a descendant of L278 used in Unix compress.“ '1 Both LZW and L278
`vary the length of strings used in compression.°' '2
`Yet another adaptive (alphabetic distribution-based) compression scheme, the Move-To-
`Front (MTF) method, was developed by Bentley et al.” In MTF, the ‘word code' for a
`symbol is determined by the position of the word in a sequential list. The word list is ordered
`so that frequently accessed words are near the front, thus shortening their encodings.
`Adaptive compression algorithms are not appropriate to use with heterogeneous files
`because they are sensitive only to changes in the particular redundancy type with which
`they are associated, such as a change in the alphabetic distribution. They do not exploit
`changes across different redundancy types in the files. Therefore a so-called adaptive method
`typically cannot directly handle drastic changes in file properties, such as an abrupt transition
`from text to graphics. For example, adaptive Huffman compressors specially optimized for
`text achieve disproportionately poor performance on certain image files, and vice versa. As
`the use of multimedia files increases, files exhibiting this sort of transition will become
`more prevalent.
`Our approach differs from adaptive compression because the system chooses each algo-
`rithm (as well as the duration of its applicability) before compression begins, rather than
`modifying the technique for each file during compression. In addition, while adaptive meth-
`ods make modifications to their compression parameters on the basis of single bytes or fixed
`length strings of input, our heterogeneous compressor bases its compression upon statistics
`gathered from larger blocks of five kilobytes. This allows us to handle much larger changes
`in file redundancy types. This makes our system less sensitive to residual statistical fluctu-
`ations from different parts of a file. We note that it is possible to use an adaptive algorithm
`as a primitive in the system.
`Another approach to handling heterogeneous files is the composition of compression
`algorithms. Composition can either be accomplished by running several algorithms in suc-
`cessmn or by combining the basic algorithms and heuristics to create a new technique. For
`etfample, recent implementations of ‘universal’ compression programs execute the Lempel—
`ZIV algorithm and dynamic Huffman coding in succession, thus improving performance
`by combining the string repetition-based compression of Lempel—Ziv with the frequency-
`based compression strategy of dynamic Huffman coding. One commercial implementation
`1s LHarc.”"5 Our system exploits the same savings since it uses the Freeze implementa-
`non of the Lempel—Ziv algorithm, which filters Lempel—Ziv compressed output through a
`Huffman coder. An example of a truly composite technique is the compression achieved
`by usrng Shannon—Fano tries* in conjunction with the Fiala—Greene algorithm (a variant
`0f.Lempel—Ziv)16 in the PKZIP2 commercial package. Tries are used to optimally encode
`strings by character frequency.l7 PKZIP was selected as the representative test program from
`this group in our experiment due to its superior performance on industrial benchmarks.9
`Our approach generaliZes the ideas of successively executing or combining different
`compression algorithms by allowing any combination of basic algorithms within a file. This
`includes switching from among algorithms an arbitrary number of times within a file. The
`algorithms themselves may be simple or composite and may be adaptive. All are treated as
`atomic commands to be applied to portions of a file.
`
`five which share a common prefix of length four.
`
`' A We is a tree of variable degree 2 2 such that (I) each edge is labelled with a character, and the depth of any node
`represents one more than the number of characters required to identify it; (2) all internal nodes are intermediate and represent
`prefixes of keys in the trie; (3) keys (strings) may be inserted as leaves using the minimum number of characters which
`distinguish them uniquely. Thus a generic trie containing the strings computer and compare would have keys at a depth of
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 082
`
`

`

`Attachment 1f: Copy of Hsu from the
`niversity of Illinois
`at Urbana—Champaign Library
`
`AUTOMATIC SYNTHESIS OF COMPRESSION TECHNIQUES FOR HEI‘EROGENEOUS FILES 1101
`
`The problem of heterogeneous files was addressed by Toal'8 in a proposal for a naive
`heterogeneous compression system similar to ours. In such a system, files would be seg-
`mented into fixed-length encapsulated blocks; the optimal algorithm would be selected for
`each block on the basis of their simple taxonomy (qualitative data type) only; and the blocks
`would be independently compressed. Our system, however, performs more in-depth statis-
`tical analysis in order to make a more informed selection from the database of algorithms.
`This entails not only the determination of qualitative data properties but the computation of
`metrics for an entire block (as opposed to sporadic or random sampling from parts of each
`block). Furthermore, normalization constants for selection parameters (i.e. the redundancy
`metrics) are fitted to observed parameters for a test library. Finally, a straightforward but
`crucial
`improvement to the naive encapsulated-block method is the implementation of a
`multi-pass scheme. By determining the complete taxonomy (data type and dominant redun—
`dancy type) in advance. any number of contiguous blocks which use the same compression
`method will be treated as a single segment. Toal observed in preliminary experiments that
`the overhead of changing compression schemes from one block to another dominated the
`additional savings that resulted from selection of a superior compression method."3 This
`overhead is attributable to the fact that blocks compressed independently (even if the same
`method is used) are essentially separate files and assume the same startup overhead of the
`compression algorithm used.‘ We have determined experimentally that merging contiguous
`blocks whenever possible obviates the large majority of changes in compression method.
`This eliminates a sufficient proportion of the overhead to make heterogeneous compression
`worthwhile.
`
`upwards from 4K).
`
`Commercial products
`
`One of the goals of this research was to develop a compression system which is gener-
`ally superior to commercially available systems. The four systems we studied are PKZIP
`developed for microcomputers running MS-DOS;2 Unix compress;3 and Stufih Classic;
`and Compact Pro,5 developed for the Apple Macintosh operating system. Each of these
`products performs its compression in a single pass, with only one method selected per file.
`Thus, the possibility of heterogeneous files is ignored.
`Unix compress uses an adaptive version of the Lempel—Ziv algorithms It operates by
`substituting a fixed-length code for common substrings, compress,
`like other adaptive
`textual substitution algorithms, periodically tests its own performance and reinitializes its
`string table if the amount of compression has decreased.
`Stufi‘It makes use of two sets of algorithms:
`it first detects special-type files such as
`image files and processes them with algorithms suited for high-resolution color data; for the
`remaining files, it queries the operating system for the expliCit file type given when the file
`was created, and uses this information to ch005e either the LZW valiant of Lempel_Ziv’4.s
`dynamic Huffman coding, or run-length encoding. This is a much more limited selection
`process than what we have implemented. Additionally, no selection of compression methods
`is attempted within a file. Compact Pro uses the same methodology as Stufflt and compress,
`but incorporates an improved Lempel—er derived directly from L277.“ The public-domain
`version of Stufi‘Yt is derived from Unix compress, as is evident from the similarity of their
`performance results.
`
`" For purposes of comparison, the block sizes tested by Toal were nearly identical to those used in our system (ranging
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 083
`
`

`

`Attachment 1f: Copy of Hsu from the
`niversity of Illinois
`
`at Urbana-Champaign Library
`
`1102
`
`w. H. HSU AND A. E. zwmuco
`
`Compression systems such as Stufi‘lt perform simple selection among alternative com-
`pression algorithms. The important problem is that they are underequipped for the task of
`fitting a specific technique to each file (even when the uncompressed data is homogeneous).
`Stufi‘lt uses few heuristics, since its algorithms are intended to be ‘multipurpose'
`. Further-
`more, only the file type is considered in selecting the algorithm — that is, no measures of
`redundancy are computed. Earlier versions of Stuff]! (which were extremely similar to Unix
`compress) used composite alphabetic and textual compression, but made no selections on
`the basis of data characteristics. The chief improvements of our heterogeneous compressor
`over this approach are that it uses a two-dimensional lookup table, indexed by file proper-
`ties and quantitative redundancy metrics, and — more important — that it treats the file as a
`collection of heterogeneous data sets.
`
`THE HETEROGENEOUS COMPRESSOR
`
`remnants of the ‘previous’ method (i.e. the algorithm used on the preceding segment). This
`
`Our heterogeneous compressor treats a file as a collection of fixed size blocks (5K in
`the current implementation), each containing a potentially different type of data and thus
`best compressed using different algorithms. The actual compression is accomplished in
`two phases. In the first phase, the system determines the type and compressibility of each
`block. The compressibility of each block of data is determined by the values of three
`quantitative metrics representing the alphabetic distribution, the average run length and the
`string repetition ratio in the file. If these metrics are all below a certain threshold, then the
`block is considered fully compressed (uncompressible) and the program continues on to the
`next block. Otherwise, using the block type and largest metric, the appropriate compression
`algorithm (and possible heuristic) are chosen from the compression algorithm database. The
`compression method for the current block is then recorded in a small array—based map of
`the file, and the system continues.
`The second phase comprises the actual compression and an optimization that maximizes
`the Size of a segment of data to be compressed using a particular algorithm. In this optimiza-
`tion, which is interleaved with the actual compression, adjacent blocks for which exactly
`the same method have been chosen are merged into a single block. This merge technique
`max1mizes the length of segments requiring a single compression method by greedin scan-
`ning ahead until a change of method is detected. Scanning is performed using the array
`map of the file generated when compression methods were selected from the database. A
`cgmpressron history, needed for decompression, is automatically generated as part of this
`p ase.
`The newly compressed segments are written to a buffer by the algorithm, which stores
`the output data with the compression history. The system then writes out the compressed
`file and exits with a signal to the Operating system that compression was successful.
`From this two-pass scheme it is straightforward to see why this system is less susceptible
`than traditional adaptive systems to biases accrued when the data type changes abruptly
`during compression. Adaptive compressors perform all operations myopically, sacrificing
`the ability to see ahead in the file or data stream to detect future fluctuations in the type
`of data. As a result, adaptive compressors retain the statistical vestiges of the old method
`until these are ‘flushed out’ by new data (or balanced out, depending upon the process for
`paging and aging internal data structures such as dictionaries). Thus adaptive compressors
`may continue to suffer the effects of bias, achieving suboptimal compression. On the other
`hand, by abruptly changing compression algorithms, our technique completely discards all
`
`Veritas Techs. LLC
`Exhibit 1026
`Page 084
`
`

`

`‘ttachment 1f: Copy of Hsu from the
`niversity of Illinois
`at Urbana—Chamoaion Librar
`
`AUTOMATIC SYNTHESIS OF COMPRESSION TECHNIQUES FOR HETEROGENEOUS FILES 1103
`
`allows us to immediately capitalize on changes in data. In addition, merging contiguous
`blocks of the same data type acquires the advantage of incurring all the overhead at once
`for switching to what will be the best compression method for an entire variable-length
`segment. The primary advantage of adaptive compression techniques over our technique is
`that the adaptive compression algorithms are ‘online’ (single-pass). This property increases
`compression speed and. more important, gives the ability to com

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket