throbber
AUTOMATIC SYNTHESIS OF COMPRESSION TECHNIQUES FOR HETEROGENEOUS FILES 1111
`
`Table II. Files used to compose the test suite and their respective origins
`j_a
`File
`File
`File
`name
`
`designation
`audio]
`
`lowrdl
`lowrdl
`lowrd3
`lowrd4
`
`text 1
`text}
`text4
`text5
`text6
`text?
`
`execu I
`execu2
`execu3
`execu4
`execu5
`execu6
`
`graph 1
`graph2
`graph3
`graph-4
`graph5
`graph6
`graph?
`
`objecl
`objec2
`objec3
`
`Cosby . snd
`
`tiCker.txt
`exsound
`huff
`
`appnote.uue
`
`phrack.txt
`techbook.txt
`
`quanta] .txt
`attil la. fluff
`shadow. fluff
`
`quanta2.txt
`
`ad
`sh
`blob
`zero
`
`network2.exe
`hostname
`
`compmisc.drw
`comppertdrw
`computer.drw
`lowres ,mpt
`3dbar. drw
`
`image.ppm
`sr1>4
`
`testl.o
`test2..o
`test3.o
`
`WP‘?
`SoundMaster Macintosh audio file
`
`ASCII characters from stock ticker
`
`compressed World Builder sound library
`compressed Unix executable
`uuencoded text
`
`English text
`Unix news article
`English text
`English text
`English text
`English text
`
`Unix executable
`Unix executable
`
`Silicon Graphics executable
`Silicon Graphics executable
`IBM PC executable
`Unix executable
`
`botus Freelance line drawing
`Lotus Freelance line drawing
`Lotus Freelance line drawing
`MacPaint file
`Lotus Freelance 3-D bar chart
`PPM (high-resolution image) file
`MacPaint file
`
`Unix object file
`Unix object file
`Unix object file
`
`C source code
`tab1e.c
`source I
`
`freezec C source code
`source2
`
`The average of each column appears in the bottom row; note that
`commercial compressor.
`the ‘percent difference’ averages are not weighted by file length, as each file is considered
`a separate experiment.
`by the synthesis system depends on that of the algo-
`Because the quality of compression
`the implementations that we use should yield
`rithms and heuristics used, improvement of
`d by comparing the results of compressing a file dom-
`higher performance. This is evidence
`ompress and Compact Pro. Both are implementations
`mated by string repetitions by Unix c
`has no heuristics, whereas Compact Pro is
`of the Lempel—Ziv algorithm. Unix compress
`5' “ Compact Pro consistently outperforms compress. It
`a better implementation of LZ77.
`f the Freeze variant of Lempel—Ziv3 used in our sys-
`should be noted that the performance 0
`
` DELL INC., EMC CORP., HPE CO., HPES, LLC -
`DELL lNC., EMC CORP., HPE C0., HPES, LLC -
` Ex. 1026, p. 147 of 152
`Ex. 1026, p. 147 of 152
`
`

`
`W. H. HSU AND A. E. ZWARICO
`
`Table Ill. Combinations of the test files and the resultant simulated data types
`_::
`Classification of
`File
`File
`data modeled
`number
`composition
` __
`news or stock report
`1
`textl — lowrdl
`object file for a graphics viewer
`2
`graph? — objecl
`multimedia application (textlgraphics)
`3
`lowrdl ~— text3 —- graph4
`graphics viewer
`4
`graph? —-— execu3
`multimedia data file (sound/graphics)
`5
`audiol —- graph}
`multimedia data file (text/graphics)
`6
`textl — lowrdl — graph3
`commercial utility
`7
`lowrd3 — execul
`multimedia application
`3
`graph2 —— lowrdl — execu2
`(graphicslsoundlexecutable)
`multimedia data or source file
`
`sourcel — lowrd3 — graph6
`
`audio] -4 text4
`lowrdl —» exccu4
`
`graph’) —— text5
`lowrd2 — text6
`text} — audio] — graph5
`lowrdl — text4 — sourcez
`
`text? — lowrd2 — graph3
`
`graph4 — audiol — execu5
`execu4 —- graph? -— text4
`objcc3 — 1owrd3 —- execufi
`objec2 —— audio] — execu2
`
`(source/compressed binarylimagc)
`multimedia data file (soundftext)
`statistical application with data
`multimedia data file (textlgraphics)
`multimedia data file (soundftext)
`multimedia data file (text/soundlgraphics)
`source file for multimedia program
`(text/source code)
`multimedia data file
`(text/compressed audiolgraphics)
`multimedia application (sound/graphics)
`multimedia application (graphicsltext)
`commercial utility
`audio application
`
`better than compress and is comparable to Compact Pro on standard
`tern does consistently
`Improving algorithms and adding or substituting new heuristics
`industrial benchmarks."
`would also yield more savings.
`
`Execution times and speed optimizations
`In this section we compare, in approximate tmits, the running time of the heterogeneous
`compressor against those of the four commercial systems the savings rates of which for our
`test files are documented above. The units are approximate for two reasons. First, because
`the four test systems are commercial the source code for three of them is not publicly
`available“. which renders an exact measure of user time infeasible. This concern is in part
`assuaged by the non—multitasked, single—user nature of the microcomputer operating systems
`on which three (compress for Linux notwithstanding) of the commercial systems reside.
`Second, however, the drastic architectural and organizational differences among the various
`native machines renders uniform comparisons unreliable. This applies even to normalized
`execution times because the host machines differ not merely in clock cycle speed, but
`in instruction set architecture and dynamic instruction frequencies for similar compression
`algorithms. The exact running times reported in this section is only that of the heterogeneous
`‘ As noted. however. the Lernpe|—Ziv implementation employed by Sluflir C!u.rsi'c is nearly identical to that of Unix compress.
`
` DELL INC., EMC CORP., HPE CO., HPES, LLC -
`DELL lNC., EMC CORP., HPE C0., HPES, LLC -
` Ex. 1026, p. 148 of 152
`Ex. 1026, p. 148 of 152
`
`

`
`AUTOMATIC SYNTHESIS OF COMPRESSION TECHNIQUES FOR HETEROGENEOUS FILES 1113
`
`compressor. These comprise the non-commercial‘ compression systems for which source
`code is available for profiling. For the commercial systems we report the observed wall
`clock time to provide a standard of comparison, but note that the host machines vary in
`computational power.
`
`Table IV. Results of the four popular commercial programs and the heterogeneous compression system,
`applied to the 20 test files
`
`File
`number
`
`8;;:aa::::a“““°”e--
`
`Original
`length
`39,348
`44,202
`46,629
`59,254
`169,108
`100,476
`131,663
`220,644
`301,805
`255,306
`59,305
`51,715
`63,189
`196,789
`148,908
`164,535
`203,912
`200,640
`366,557
`278,152
`
`Unix
`compress
`20,578
`44,202
`46,629
`52,076
`168,903
`69,771
`131,663
`190,971
`145,993
`204,457
`30,178
`51,715
`63,189
`176,276
`73,555
`141,067
`203,912
`128,675
`265,114
`223,277
`
`PKZIP
`v1.10
`17,1 19
`39,813
`46,629
`40,571
`151,478
`53,043
`103,544
`137,886
`1 12,503
`191,378
`22,782
`43,032
`58,247
`196,789
`63,748
`132,992
`184,657
`107,728
`198,727
`193,980
`
`Stufilt
`Classic
`20,575
`40,412
`43,261
`45,202
`149,701
`65,417
`106,643
`173,677
`137,685
`206,193
`29,701
`46,462
`59,569
`172,486
`75,595
`135,245
`189,398
`125,461
`265,027
`224,943
`
`Compact Heterogeneous
`Pro v1.32
`compressor
`16,831
`16,315
`41,112
`37,388
`40,367
`36,477
`41,607
`38,007
`148,917
`134,524
`52,349
`50,906
`109,979
`96,429
`137,401
`127,384
`1 15,096
`103,730
`183,313
`168,675
`22,858
`21,774
`44,107
`40,229
`59,934
`54,481
`151,057
`137,052
`64,618
`63,778
`110,093
`104,175
`202,821
`170,564
`104,711
`101,674
`198,756
`137,659
`191,763
`181,030
`
`Total
`
`3,102,137
`
`2,432,201
`
`2,096,646
`
`2,312,653
`
`2,037,690
`
`1,872,251
`
`The running times for the commercial systems on the entire test suite documented above
`appear in Table VI. All of the execution times are measured in wall clock units except for
`the heterogeneous compressor’s, which is a total of user times as reported by prof, the C
`profiler under Unix. The wall clock time was empirically observed not to differ noticeably
`from this total on an unloaded Unix machine. The commercial systems were similarly tested
`on unloaded (or sing1e—task) systems.
`For Unix compress, the mean running time was 26 s, where the average was taken
`over runs on different Sun workstations of comparable power (documented below). A Unix
`implementation of PKZIP was also tested on one of these Sun workstations, and achieved
`an execution time of 56 s — only slightly better than the personal computer version. The
`mnning time of 856 s placed the heterogeneous compressor in the middle to high end of
`the commercial compressors in terms of mnning time.
`
`“ For this purpose we continue to consider Unix compress commercial, due to its wide range of versions.
`
` DELL INC., EMC CORP., HPE CO., HPES, LLC -
`DELL lNC., EMC CORP., HPE CO., HPES, LLC -
` Ex. 1026, p. 149 of 152
`Ex. 1026, p. 149 of 152
`
`

`
`1114
`
`w. H. HSU AND A. E. zwatuco
`
`NOE--IONLII-I’-‘A-U-'ll\’l'-t
`
`Table V. Percent savings for the test compression systems‘
` ._?___
`File
`Unix
`PKZIP
`Siufilr
`Compact
`Heterogeneous
`Best
`Average
`number
`compress
`v1.10
`Classic
`Pm v1.32
`compressor
`win
`win
`(% saved)
`(% saved)
`(91: saved)
`{% saved)
`(90 saved)
`(91: diff.)
`(96 diff.)
`47-70
`56-49
`47-71
`57-23:
`58-54
`1-31
`6-25
`0-00
`9-93:
`8-57
`6-99
`15-42
`5-49
`9-04
`0-00
`0-00
`7-22
`13-43*
`21-77
`8-34
`16-61
`12-11
`31-53:
`23-71
`29-78
`35-86
`4-33
`11-57
`0-12
`10-43
`11-48
`1 1-94*
`20-45
`8-51
`11-96
`30-56
`47-21
`34-89
`47-90:
`49-34
`1-44
`9-20
`0-00
`21-36:
`19-00
`16-47
`26-76
`5-40
`12-55
`13-45
`37-51
`21-29
`37-73-:
`42-27
`4-54
`14-77
`51-63
`62-72:
`54-38
`61-86
`65-63
`2-9]
`7-98
`19-92
`25-04
`19-24
`28-20:»
`33-93
`5-73
`10-83
`49-11
`61 -59*
`49-92
`61-46
`63-28
`1-70
`7-77
`0-00
`1679-
`10-16
`14-71
`22-21
`5-42
`11-80
`0-00
`7-82:
`5-73
`5-15
`13-78
`5-96
`9-11
`10-42
`0-00
`12-35
`23-24*
`30-36
`7-12
`18-85
`50-60
`57-19»:
`49-23
`56-61
`57-17
`-0-02
`3-76
`14-26
`19-17
`17-80
`33-09:
`36-69
`3-60
`15-60
`0-00
`9-44:
`7-12
`0-54
`16-35
`6-91
`12-08
`35-87
`46-31
`37-47
`47-81:
`49-33
`1-51
`7-46
`27-67
`45-79:
`27-70
`45-78
`48-80
`3-02
`12-07
`19-73
`30-26
`19-13
`31 -06-:
`34-92
`3-86
`9-87
`
`10-96
`4-35
`37-14
`31 -55*
`24-21
`29-83
`19- I 6
`Average
`__:?.
`
`' The starred entry in each row is the best commercial system.
`
`CONCLUSIONS
`
`Analysis of results
`
`This project was successful on
`pression plans from encapsulated primitives for heterogeneous files was illustrated. The us
`of property analysis and redundancy metrics was experimentally successful. the latter veri
`fying the applicability of statistical data analysis to automatic programming in this domai
`The positive test results obtained with the primitive database currently available woul
`probably be even better with improved implementations of the algorithms and heuristic
`The statistical foundations of the heterogeneous system proved strong enough to be of de
`inite relevance to the operating systems community, and might be useful in an infonnatio
`theoretic context. The benefits of data compression are ubiquitous in that savings throug
`compression are independent of hardware and storage capabilities; selective techniques in
`crease these savings by a significant factor for heterogeneous files.
`
`Future work
`
`The sampling method may be improved in future implementations by
`The increase in analysis accuracy that this would bring would demand more primitives an
`heuristics -— such need would arise in any case with the continuing development of ne
`files types, such as high-resolution animation and three—dimensional images.
`
` DELL INC., EMC CORP., HPE CO., HPES, LLC -
`DELL |NC., EMC CORP., HPE C0., HPES, LLC -
` Ex. 1026, p. 150 of 152
`Ex. 1026, p. 150 of 152
`
`

`
`AUTOMATIC SYNTHESIS OF COMPRESSION TECHNIQUES FOR HETBROGENEOUS FILES 1115
`
`Table VI. Execution times of the heterogeneous and commercial compressors
`
`Compression system
`
`Unix compress
`PKZIP v1.l0
`Srufflt Classic
`
`Execution time
`(s)
`w 26
`67
`1 152
`
`Execution time
`(min)
`0:26
`1:07
`19:12
`
`Compact Pro V1.32
`Heterogeneous compressor
`
`1594
`856
`
`26:34
`14:56
`
`In the current system, lossy compression methods can be applied only if an entire file
`is found to be of a lossily compressible data type. Typically, these include high-resolution
`images (for JPEG) and speech, general high-definition audio, and high-resolution animation
`files. A special case could be implemented specifying that when an entire file matching a
`single lossily compressible data type (i.e. a homogeneous loss-perrnissible file) is found,
`the lossy algorithm may be applied.
`The difficulty is that without explicit information on where loss—permissible portions of
`a heterogeneous (e.g. multimedia) file begin and end, the compressor cannot absolutely
`guarantee that no data will be distorted which the user is not willing to have distorted.
`Thus no lossy methods can be safely applied to any segment in the block-based system.
`Thus a heterogeneous system would require either full interactive guidance from a user
`who could inspect the file or knew its contents, or would require improved magic numbers
`which encoded the lengths of loss-permissible segments. The heterogeneous system could
`then scan for these codes during the property analysis phase and preempt or modify metric-
`based selection if a lossy algorithm is warranted. The latter approach seems far superior
`to interactive compression, which places an intolerable burden of responsibility on users
`(consider a multimedia file with hundreds of interspersed digitized photographs).
`Another improvement worth considering is the use of a ratings system for specialized
`(especially lossy) compression algorithms such as JPEG and MPEG. For example, by des-
`ignating RLE compression ‘0 per cent alphabetic distribution, 100 per cent run length, 0
`per cent string repetition’ and by defining its single-type counterparts similarly, a standard
`can be established. Unix compress, for instance, might rate ‘40 per cent AD, 0 per cent
`RL, 60 per cent SR‘ and a hypothetical algorithm X might rate ‘25 per cent AD, 50 per cent
`RL, 25 per cent SR’ . The rating standard would correspond to the metric rating system for
`files which our system uses, and would help in analysis of the perfonnance of composite
`compression techniques (which handle multiple redundancy types). Non-synthesized com-
`posite techniques exist, both adaptive and non-adaptive, though results are not as promising
`as those of automatically generated techniques.
`Finally, it is clear from the frequency of duplicate entries in the algorithm lookup table
`that the database of primitives used in this heterogeneous system may not be as well—stocked
`as it optimally could be. Storer’ lists a plethora of optional heuristics which are applicable
`to Lempel—Ziv compression, specifically in augmenting and deleting from the dictionary.
`
`ACKNOWLEDGEMENTS
`
`This paper was produced as part of a research project at Johns Hopkins University. We
`are grateful to the faculty and staff of the JHU Computer Science Department, and to the
`Brown University CS Department, for their assistance throughout this work.
`
` DELL INC., EMC CORP., HPE CO., HPES, LLC -
`DELL lNC., EMC CORP., HPE CO., HPES, LLC -
` Ex. 1026, p. 151 of 152
`Ex. 1026, p. 151 of 152
`
`

`
`1116
`w. H. HSU AND A. E. zwmuco
`We would like to thank Leonid Broukhis, Graham Toal. and Kenneth Zeger for discus-
`sions on some of the research reported here. We also thank Jonathan Eifrig, Bill Goodman.
`and Tom Lane for guidance on several technical issues. Finally, we thank the anonymous re-
`viewers for their comments and suggestions, especially for introduction to relevant literature
`in arithmetic coding.
`
`REFERENCES
`
`James A. Storer, Data Compression: Methods and Theory, Computer Science Press. Rocl-tville, MD. 1988.
`Phillip W. Katz, PKZIP. Commercial compression system, version 1.1, 1990.
`Sun Microsystems. compress. Commercial compression system. operating system version 5.3, September
`1992.
`Raymond Lau. Stufllt Classic and Stufflt Deluxe. Commercial compression system. 1990.
`Bill Goodman. Compact Pro. Commercial compression system, v1.32. 1991.
`Terry A. Welch, ‘A technique for high performance data compression’, IEEE Computer, 17(6), 8-19 (1984).
`Gilbert Held and Thomas R. Marshall. Data Compression: Techniques and Applications: Hardware and
`Software Considerations. 3rd edn, John Wiley and Sons, 1991.
`Leonid Broul-this, Freeze implementation of LZHuf algorithm. comp.sources.misc archives, Internet, 1991.
`Jean-Loup Gailly, compcompression benchmark (Calgary test corpus). In compcompression FAQ list, 1.
`Gailly, (ed.). 1992.
`Jeffrey S. Vitter. ‘Dynamic Huffman Coding’, ACM Transactions on Mathematical Software, (June 1989).
`J. Ziv and A. Lempel.
`‘A universal algorithm for sequential data compression’, IEEE Transactions on
`Information Theory, 23.(3], 337-343 (1977).
`J. Ziv and A. Lempel. ‘Compression of individual sequences via variable-rate coding‘, IEEE Transactions
`on Information Theory, 24(5), 530-546 (1978).
`Jon Louis Bentley, Daniel D. Sleator, Robert E. Tarjan and Victor K. Wei,
`‘A locally adaptive data
`compression scheme’. Communications of the ACM. 320»~330 (April 1986).
`Yooichi Tagawa, Haruhiko Okumura and Haruyasu Yoshizaki. LZHuf'. encoding/decoding module for
`LHarc. Compression system. version 0.03 (Beta), 1989.
`Haruyasu Yoshizalti, LHA: A high-performance file-compression program. Compression system, version
`2.11, 1991.
`Edward R. Fiala and Daniel H. Greene, ‘Data compression with finite windows‘, Communications of the
`ACM, 490-505 (1989).
`Ellis Horowitz and Sartaj Sahni, Fundamentals of Data Structures in Pascal, Computer Science Press,
`Rockville, Maryland, second edition, 1987.
`Graham Toal. Personal communication. Unpublished, 1992.
`Gerard Salton. Automatic Text Processing.‘ The Transformation, Analysis, and Retrieval of Information by
`Computer, Addison-Wesley. Reading. MA, 1989.
`Ian F. Darwin. file (program). Berkeley Unix operating system, 1987.
`David A. Huffman, ‘A method for the construction of minimum-redundancy codes‘, Proceedings of the
`IRE‘, number 40, 1952, pp. 1098-1101.
`Claude E. Shannon and Warren Weaver, The Mathematical Theory of Communications, University of
`Illinois Press, Urbana and Chicago, 1963.
`Roben Sedgewick, Algorithms, 2nd edn, Addison»Wesley, Reading. MA, 1988.
`Timothy C. Bell, John G. Cleary and Ian H. Witten, Text Compression. Prentice Hall. Englewood Cliffs,
`New Jersey, 1990.
`Sheldon Ross, A First Course in Probability, Macmillan Publishing Company, New York, third edition.
`1988.
`Ian H. Witten, Radford Neal and John G. Cleary, ‘Arithmetic coding for data compression‘, Communica-
`tions ofthe ACM, 30(6), 520-540 (1987).
`Independent JPEG Group. ‘JPEG image compression system‘, think.com FTP archives, Internet, 1994.
`Jean-Loup Gailly.
`comp.compressionlcompcompression.research FAQ list.
`I. Gailly (ed.). URL
`http : //we . cis .ohio-state . edu/hypertext/faq/usenet/compression-faq/top . html, 1994.
`James A. Storer, Image and Text Compression, Kluwer Academic Publishers, Norwell, MA, 1992.
`Graham Toal. C implementation of dynamic Huffman compressor by J. S. Vitter.
`comp.source.misc
`archives, Internet. 1990.
`
` DELL INC., EMC CORP., HPE CO., HPES, LLC -
`DELL INC-, EMC CORP., HPE co., HPES, LLC -
` Ex. 1026, p. 152 of 152
`Ex. 1026, p. 152 of 152

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket