`
`Office of Business Enterprises
`Duplication Services Section
`
`THIS IS TO CERTIFY that the collections of the Library of Congress contain a bound
`volume entitled A GUIDE TO DATA COMPRESSION METHODS, David Salomon, call
`number QA 76.9.D33 S28 2002, Copy 2. The attached - Title Page, Copyright Page, Table of
`Contents Pages and Introduction Pages - are a true and complete representation from that work.
`
`IN WITNESS WHEREOF, the seal of the Library of Congress is affixed hereto on
`May 18, 2018.
`
`Deirdre Scott
`
`
`
`Business Enterprises Officer
`Office of Business Enterprises
`Library of Congress
`
`101 Independence Avenue, SE Washington, DC 20540—4917 Tel 202.707.5650 www.10c.gov; duplicationsewices@loe.gov
`
`Comcast - Exhibit 1010, page 1
`
`Comcast - Exhibit 1010, page 1
`
`
`
`
`
`
`
`FT HEHDE
`NRC
`
`'
`
`.
`A Gulde to
`
`DATA
`#5:
`:32 DMPRESSION
`6°” M ETHO DS
`
`,
`
`,
`
`2 David Salomon
`
`
`
`
`CD—ROM "7 /
`INCLUDED
`
`Comcast - Exhibit 1010, page 2
`
`Comcast - Exhibit 1010, page 2
`
`
`
`A Guide to
`
`DATA
`COMPRESSION
`METHODS
`
`David VSalomon
`
`
`
`
`
`With 92 Illustrations
`
`IncludesaCD-ROM l fit?” I
`
`
`
`Springer
`
`Comcast - Exhibit 1010, page 3
`
`Comcast - Exhibit 1010, page 3
`
`
`
`
`
`David Salomon
`
`Department offlompater Science .
`
`(Lalilornia State University, Northrtdge
`
`Northridge, CA 91330-8281
`
`USA
`
`david.salomon@csun.edu
`
`
`Cover Illustration: “Abstract: Yellow, Blue”, 1993, Patricia S. Brown/Superstock.
`
`Library of Congress Cataloging-in—Publication Data
`Salomon, D. (David), 1938~
`A guide to data compression methods/David Salomon.
`p.
`cm.
`Includes bibliographical references and index.
`ISBN 0-387-95260-8 (pbk.: alk. paper)
`1. Data compression (Computer science).
`QA76.9D33 528
`2001
`005.74’6—dc21
`
`I. Title.
`
`2001—032844
`
`Printed on acid-free paper.
`
`© 2002 Springer-Vcrlag New York, Inc.
`All rights reserved. This work mar not be translated or copied in whole or in part without the written permission of
`the publisher (Springer-Verlag New York. Inc, 175 Fifth Avenue. New York, NY 10010, i'SA), except for brief excerpts
`in connection with reviews or scholarly analysis. Use in contraction with any form of information storage. and
`n‘lricval, CiL'LthJl'IiL adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
`developed is [ut‘bidtlL-n.
`The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not
`especiallyvidentiiied, is not to be taken as a Sign that such names, as understood by the Trade Marks and
`Merchandise Marks ACI. may accordingly be used freely by anyone.
`
`Production managed by Frank MVGuckin; manufacturing supervised by Erica Bresler.
`Camera ready Lapy prepared iron: the author’s 'l‘t-X files.
`Printed and bound by Hamilton Printing Co., Rensselaer, NY.
`Printed in the United States of Aliicrica,
`
`9 8 7 6 5 4 3 2 1
`
`ISBN 0-387—95260-8
`
`SPlN 10796580
`
`Springer Veriag New York Berlin Heidelberg
`A member raj tiertelimrmnSpriagr-rScienceI-Brrsiness Media GmbH
`
`
`
`Comcast - Exhibit 1010, page 4
`
`Comcast - Exhibit 1010, page 4
`
`
`
`Contents
`
`Preface
`
`
`_
`
`Introduction
`
`_
`
`_
`
`
`
`ix
`
`1
`
`1.
`
`Statistical Methods ____
`
`
`
`_
`
`. 9
`
`l
`2
`
`3
`4
`5
`6
`7
`8
`
`Entropy
`Variable-Size Codes
`
`9
`10
`
`12
`Decoding
`'12
`I'qufnian Coding
`22
`A daptive. Huffman Coding
`32
`Facsimile Compression
`40
`Arithmetic Coding
`Adaptive Arithmetic Coding
`:32
`
`
`2. Dictionary Methods
`
`57
`
`1
`2
`3
`4-
`
`5
`
`L277 (Sliding Window)
`L288
`L278
`LZVV
`
`81.1mm my
`
`3.
`
`Image Compression
`1
`lntrodlnztion
`
`__
`
`________
`
`2
`3
`4
`
`5
`6
`7
`8
`
`Image Types
`Aimronclios to Image Compression
`Intuitive Methods
`
`Image Transforms
`Progressive. Image Compression
`JPEG
`JPEG-LS
`
`59
`62
`(36
`69
`
`80
`
`82
`
`8'?
`88
`103
`
`10/1
`135
`140
`159
`
`i 7 81
`
`
`
`Comcast - Exhibit 1010, page 5
`
`Comcast - Exhibit 1010, page 5
`
`
`
`
`—
`
`167
`181
`185
`190
`197
`199
`203
`207
`214
`
`,
`
`Video Compression
`
`1
`2
`
`227
`Basic Principles
`Suboptimal Search Methods
`233
`
`
`Audio Compression
`1
`Sound
`
`Digital Audio
`The Human Auditory System
`Conventional Methods
`
`2
`3
`4-
`
`5
`
`2/12
`
`245
`247
`250
`
`i\-'iPEGr—’l Audio Layers
`253
`
`_
`
`Bibliography
`
`Glossary
`
`__
`
`_ __
`
`_
`
`__
`
`167
`
`227
`
`241
`
`269
`
`275
`
`
`
`
`4.
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Contents
`
`Wavelet Methods
`1
`Averaging and Difl’erencing
`2
`The Haar Transform
`3
`Subband 'l‘ransforms
`4
`Filter Banks
`5
`Deriving the Filter Coefficients
`6
`The DWT
`7
`Examples
`8
`The Daubechies Wavelets
`9
`SPIH’I’
`
`Joining the Data Compression Community _— 284
`
`Appendix of Algorithms _
`_
`
`_
`
`285
`
`287
`
`Index
`
`Thus a rigidly chronological series of letters would
`present a patchwork of subjects, each of which would
`be difficult to follow. The Table of Contents will show
`in what way I have attempted to avoid this result.
`
`——Charles Darwin, Life and Letters of Charles Darwin
`
`
`
`Comcast - Exhibit 1010, page 6
`
`Comcast - Exhibit 1010, page 6
`
`
`
`Introduction
`
`letters, such as “E,” “T,” and “A” are common, whereas other letters, such as “Z” and
`
`Those who use compression software are familiar with terms such as “zip,” “implode,”
`“stufiit,” “diet,” and “squeeze.” These are names of programs or methods for compress-
`ing data, names chosen to imply compression. However; such names do not reflect the
`true nature of data compression. Compressing data is not done by stufling or squeezing
`it, but by removing any redundancy that’s present in the data The concept of redun—
`dancy is central to data compression. Data with redundancy can be compressed. Data
`without; any redundancy cannot be compressed. period.
`We all know what information is. We intuitively understand it but we consider it a
`qualitative concept. Information seems to be one of those entities that cannot be quan—
`tified and dealt with rigorously. There is, however, a. mathematical field called informa-
`tion theory, Where information is handled quantitatively. Among its other achievements,
`information theory shows how to precisely define redundancy. Here, we try to under-
`stand this concept intuitively by pointing out the redundancy in two common types of
`computer data and trying to miderstand why redundant data is used in the first place.
`The first type of data. is text. Text is an important example of computer data.
`Many computer applications, such as word processing and software compilation, are
`nonnumeric; they deal with data Whose elementary components are characters of text.
`The computer can store and process only binary information (zeros and ones), so each
`character of text must be assigned a binary code. Present—day computers use the ASCII
`code (pronounced “ass—key,“ short for “American Standard Code for Information In-
`terchange”), although more and more computers use the new Unicode. ASCII is a
`fixed—size code Where each character is assigned an 8—bit. code (the code itself occupies
`seven of the eight bits, and the eighth bit is parity, designed to increase the reliability
`of the code). A fixed—size code is a natural choice because it makes it easy for 'soft—
`ware applications to handle characters of text. 011 the other hand. a fixed—size code is
`inherently redundant.
`In a file. of random text, we expect each character to occur approximately the same
`number of times. However. files used in practice are rarely random. They contain
`meaningful text, and we know from experience that.
`in typical English text certain
`
`Comcast - Exhibit 1010, page 7
`
`Comcast - Exhibit 1010, page 7
`
`
`
`
`
`Introduction
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`“Q.“ are rare. This explains why the Ah‘t'll codi- is rc-dlim‘lz-inl. and also points the war
`lulundniu'y. _.\:-{(_‘[l is redundant l'u‘r‘ausc it assigns to ('Etlt‘ll cliaructm;
`to l-‘limirmtiug the
`[70111111011 01' Till-TC.
`the same mnnhrn‘ (eight) of bits.
`.Iiivmoving the rcdunilnm'y can he
`done by assigning variable—size codes to the r-lmsl'artvrs. with shurl. codes assigned to
`the common characters and long codes assigned to the rare ones. This is precisely how
`Huffman coding (Section 1.4) works.
`Imagine two text files A and B with the same text. where A uses ASCII codes and
`B has variable-size codes. We expect B to be smaller than A and we say that A has
`been compressed to B. It is obvious that the amount. of compression depends on the
`redundancy of the particular text and on the particular variable—size codes used in file
`B. Text where certain characters are very common while others are very rare has much
`redundancy and will compress well if the variable—size codes are properly assigned. In
`such a file, the codes of the common cl'iaracters should be very short, while those of the
`rare characters can be long. The long codes would not. degrade the compression because
`they would rarely appear in [3. Most of B would consist of the shorl. mules.
`ltnndmn
`text. on the other hand. does not benefit from replacing A St‘ll with VELJ'lE-l.l')l(.‘—Si'/.(‘ miles,
`because the compression achieved by the short codes is <::-1.ru:e|.lwl out by Ll'u- long (rules.
`This is a. special case of a general rule that says that random data cannot. be compressed
`because it has no redundancy.
`The second type of common computer data is digital images. A digital image
`is a rectangular array of colored dots, called pistols. Each pixel is represented in the
`computer by its color code.
`(In the remainder of this section, the term “pixel” is used
`for the pixels color code.)
`In order to simplify the software applications that handle
`images, the pixels are all the same size. The size of :1 pixel depends on the lllJllllH'l' of
`colors in the image, and this number is normally a power of 2. If there are 2’c colors in
`an image, then each pixel is a k-bit number.
`There are two types of redundancy in a digital image. The first type is similar to
`redundancy in text. In any particular image. certain colors may dominate, while others
`may be infrequent. This redundancy can be removed by assigning variable—size codes to
`the pixels, as is done with text. The other lype of I'(‘.{lLIhd8.-l"ll"_\' is much more imprn'lnnt
`and is the result of plan?! r'm'reletmn. As our eyes move along the llllttgb‘ from pixel to
`pixel, we find that in most cases, adjacent pixels have similar colors. Imagine an image
`containing blue sky, white clouds, brown mountains, and green trees. As long as we
`look at a mountain, adjacent pixels tend to be similar; all or almost all of them are
`shades of brown. Similarly, adjacent pixels in the sky are sl'mdes of blue. It is “1'1 l.‘ ”'1
`the horizon, where mountain meets sky, that adjacvnl pixels may have very diili-‘ri‘nl'
`colors. The individual pixels are therefore not conmlctc—rly il'nlepmuir-ut, and WC 52W l-l'i”
`neighboring pixels in an image tend to be corivlr.d.r1d. This type. of reduuth-im-y can be
`exploited in many ways, as described in Chirriter 3.
`Regardless of the method used to corripress an image, the effectiveness of the com—
`pression depends on the amount of redumlrumy in the image.
`(.Jnc extreme rasv is n
`uniform image. Such an image has maximum rerjlundanry lur:-.-.-.m.~‘-c :1:'li:1<'cnt pixels are
`identical. Obviously, such an image is not hitcrtisiing and. is rarely.
`if flu-er. used iii
`practice. However, it will compress very well under any image cur-npressiun Int"! 1W1-
`The other extreme example is an image with uncorrelatml pixels. All adjacent pixiils
`
`Comcast - Exhibit 1010, page 8
`
`Comcast - Exhibit 1010, page 8
`
`
`
`Introduction
`
`.3
`
`in such an image are very different$ so the image redln‘ulancy is zero. Such an image
`will not conmress. regardless of the compression method used. However, such an image
`tends to look like a random jumble of dots and is therefore uninteresting. We rarely
`need to keep and manipulate such an image7 so we rarely need to compress it. Also, a.
`truly random image features small or rzero correlation between pixels.
`
`
`
`What with all the ARC war flames going around, and arguments about which program
`is best, I decided to do something about it and write my OW N.
`You‘ve heard of crunching. jamming, squeezing, squashing, packing, crushing implod—
`ing, ctc.. ..
`Now there’s TRA SHIN G.
`
`TRASH compresses a file to the smallest size possible: 0 bytes! NOTHING compresses ‘
`
`| a file better than '1‘RASII! Date / time stamp are not affected, and since the file is zero
`bytes long, it doesn’t even take up any space on your hard disk!
`In fact“ it takes
`:And TRASH is FAST! Files can be TRASHED in microseconds!
`' longer to go through the various parameter screens than it does to trash the file!
`This prerelease version of TRASH is yours to keep and evaluate.
`I. would reconnnend
`backing up any files you intend to TRASH first, though. .
`.
`The next version of TRASH will have graphics and take. wildcards:
`TRASH C: \PAYRDLL\* . *
`.
`.
`. and will even work on entire drives:
`TRASH D:
`
`2” different files of sizes less than or equal 71/2. However, the total number of these files
`
`The following simple argument illustrates the essenct‘} of the statement “Data com-
`pression is achieved by reducing or removing redundancy in the. data.” The argument
`shows that; most data files cannot be compressed, no matter what compression method
`is used. This seems strange at first because we compress our data files all the time.
`The point is that most files cannot be compressed because they are random or close
`to random and therefore have no redundancy. The (relatively) few files that can be
`compressed are. the. ones that we went to compress; they are the files we use all the
`time. They have. redundancy, are nonrandom and therefore useful and interesting.
`Given two different files A and I? tl'lat are compressed to files C and D, respectively,
`it is clear tl'iat C and D must be different. If they were identical, tl'iere would be no
`way to decompress them and get back file A or file B.
`Suppose that a file of size 77 bits is given and we want to compress it efficiently.
`Any compression method that can compress this file to, say: 10 bits would be welcome.
`Even compressing it to ]] bits or 12 bits would be great. We therefore (somewhat
`arifitrarily) assume that compressing such a file to half its size or better is considered
`good compression.
`’Il‘here are. 2” 77—bit files and they would have to be compressed into
`
`. or he first on your block to trash your system ON PURPOSE!
`.
`.
`TRASH ALL
`
`we’re even hoping to come up with a way to RECOVEI’i. ’l'l-tASHcd files!
`From FIDO News, 23 April 1990
`
`Comcast - Exhibit 1010, page 9
`
`Comcast - Exhibit 1010, page 9
`
`
`
` Introduction
`
`
`
`is
`
`A;- = 1 +2+4 _{7 . .. + 271/2 : 21+n/‘2 _ 1 m 21+7U‘2‘
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`so only N of the 2" original files have a chance of being compressed elliciently. The
`problem is that N is much smaller than 2”. Here are two examples of the ratio between
`these two numbers.
`
`For 71 = 100 (files with just 100 bits), the total number of files is 2100 and the
`number of files that can be compressed efiicieutly is 251. The ratio of these numbers is
`the ridiculously small fraction 2'49 m 1.78 - 10' 15.
`For n = 1000 (files with just 1000 bits. about 125 bytes), the total number of files
`is 21000 and the number of files that can be compressed eHiciently is 2501. The ratio of
`these numbers is the incredibly small fraction 2—499 2 9.82 - 10“”.
`
`Most files of interest are at least some thousands of. bytes long. For such files,
`the percentage of files that can be efiiciently compressed is so small that it cannot be
`computed with floating—point numbers even on a supercomputer (the result is zero).
`It is therefore clear that no compression method can hope to compress all files or
`even a significant percentage of them. In order to compress a data file, the compression
`algorithm has to examine the data, find redundancies in it, and try to remove them.
`Since the redundancies in data depend on the type of data (text, images, sound, etc),
`any compression method has to be developed for a specific type of data and works best
`on this type. There is no such thing as a universal, efficient data compression algorithm.
`The rest of this introduction covers important technical terms used in the field of
`data compression.
`
`The compressor or encoder is the program that compresses the raw data in the
`I
`input file and creates an output file with compressed (low—redundancy) data. The de-
`compressor or decoder converts in the opposite direction. Notice that the term encoding
`is very general and has wide meaning, but since we discuss only data compression, we
`use the name encoder to mean data compressor. The term codec is sometimes used to
`describe both the encoder and decoder. Similarly, the term compo'rldiug is short for
`“compressing/expan ding. ”
`
`A nonadapti'ue compression method is rigid and does not modify its operations,
`I
`its parameters, or its tables in response to the particular data being compressed. Such
`a method is best used to compress data that is all of a single. type. Examples are
`the Group 3 and Group 4 methods for facsimile compression (Section 1.6). They are
`specifically designed for facsn'nile compression and would do a poor job compressing
`any other data. In contrast, an adoptive method examines the raw data. and modifies
`its operations and/or its parameters accordingly. An example is the adaptive H uflrnan
`method of Section 1.5. Some compression methods use a two—pass algorithm, where the
`first pass roads the input file to collect statistics on the data to be compressed, and
`the second pass does the actual compressing using parameters or codes set by the first
`pass. Such a method may be called sermCadaptl'r/e. A data compression method 0811
`also be locally adaptive, meaning it adapts itself to local conditions in the input file and
`varies this adaptation as it moves from area to area in the input. An example is the
`move-to—front method [Salomon 2000].
`
`
`
`Comcast - Exhibit 1010, page 10
`
`Comcast - Exhibit 1010, page 10
`
`
`
`Introduction
`
`0
`
`Syvmnetrz'c compression is the arse where the compressor and decompressor use the
`I
`same basic algorithm but work in “opposite” directions. Such a method makes sense
`for general work. where the same number of files are compressed as are decompressed.
`In an asymmetric compression method either the compressor or the decompressor may
`have to work significantly harder. Such methods have their uses and are not necessarily
`bad. A compression method where the compressor executes 2. slow, complex algorithm
`and the decompressor is simple is a natural choice when files are compressed into an
`archive, where they will be decompressed and used very often, such as mp3 audio files
`on a CD. The opposite case is useful in environments where files are updated all the
`time and backups are made. There is only a. small chance that a backup file will be
`used, so the decompressor isn’t used very often.
`
`Compression performance: Several quantities are commonly used to express the
`-
`performance of a compression method.
`1. The compression ratio is defined as
`
`Lo.9.5';i//l().5'stcss compression: Certain compression methods are lossy. They achieve
`-
`better compression by losing some information. When the compressed file is decom-
`pressetl,
`the result is not identical to the original data. Such a. method makes sense
`when compressing images, movies, or sounds.
`If the loss of data is small, we may not
`be able to tell the difference. In contrast, text files, especially files containing computer
`programs. may become worthless if even one bit gets modified. Such files should be
`compressed only by a. lossless compression method.
`[Two points should be mentioned
`regarding text files:
`(1) If a text file contains the source code of a mogram.’ many blank
`spaces can normally be eliminated, since they are disregarded by the compiler anyway.
`(2) When the output of a word 1'n'ocessor is saved in a text file, the file may contain in—
`formation al‘nnit the different fonts used in the text. Such information may be discarded
`if the user wants to save just the text]
`
`The bit budget for the tables is 10% in this case.
`
`
`size of the output file
`Compression ratio = — _
`-
`.
`.
`Size of the input file
`
`.
`
`A value of 0.6 means that the data occupies 60% of its original size after compression.
`Values greater than 1 indicate an output file bigger than the. input file (negative com—
`pression). The compression ratio can also be called bpb (bit per bit), since it equals
`the number of bits in the compressed file needed, on average. to compress one bit in the
`input file. In image compression, the similar term bpp stands for “bits per pixel.” In
`modern, efficient text compression methods. it makes sense to talk about bpe (bits per
`character), the number of bits it takes. on manage, to compress one character: in the
`input file.
`Two more terms should be mentioned in connection with the compression ratio.
`
`The term nitrate (or “bit rate”) is a. general term for bpb and bpc. Thus, the main
`goal of data compression is to represent any given data at low bit rates. The term bit
`budget refers to the functions of the individual bits in the compressed file.
`Imagine a
`compressed file where 90% of the bits are variable—size codes of certain symbols and the
`remaining 10% are used to encode certain tables that are needed by the decompressor.
`
`Comcast - Exhibit 1010, page 11
`
`Comcast - Exhibit 1010, page 11
`
`
`
` 6
`
`
`Introduction
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`2. The inverse of the compression ratio is the compression factor:
`
`size of the input file
`_
`_
`Compressron factor 2 —_~-_—- .
`. -.
`swe of the output file
`
`In this case values greater than 1 indicate compression, and values less than 1 imply
`expansion. This measure seems natural to many people, since the bigger the factor,
`the better the compression. This measure is distantly related to the sparseness ratio, a.
`performance measure discussed in Section 4.1.2.
`3. The expression 100 X (1 a compression ratio) is also a. reasonable measure of compres—
`sion performance. A value of 60 means that the output file occupies 40% of its original
`size (or that the compression has resulted in savings of 60%).
`4. In image compression, the quantity bpp (bits per pixel) is commonly used. It equals
`the number of bits needed, on average, to compress one pixel of the image. This quantity
`should always be compared with the bpp before compression.
`
`The probability model. This concept is important in statistical data compression
`-
`methods. Sometimes, a compression algorithm consists of two parts, a probability model
`and the compressor itself. Before the next data item (bit, byte, pixel, or anything else)
`can be compressed, the model is invoked and is asked to estimate the probability of the
`data item. The item and the probability are then sent to the compressor, which uses
`the estimated probability to compress the item. The better the probability, the better
`the item is compressed.
`Here is an example of a simple model for a black and white image. Each pixel in
`such an image is a single bit. Assume that after reading 1000 pixels and compressing
`them, pixel 1001 is read. What is the probability that this pixel is black? The model
`can simply count the numbers of black and white pixels read so far. If 350 of the 1000
`pixels were black, then the model can assign pixel 1001 a probability of 350/1000 = 0.35
`of being black. The probability and the pixel (which may, of course be either black or
`white) are then sent to the compressor. The point is that the decompressor can easily
`calculate the same probabilities before it decodes the 1001st pixel.
`
`The term alphabet refers to the set of symbols in the data being compressed. An
`I
`alphabet may consist of the two bits 0 and 1, of the 128 ASCIl characters, of the 256
`possible 8—bit bytes, or of any other symbols.
`
`The performance of any compression method is limited. No method can compress
`I
`all data files efficiently. Imagine ap plying a sophisticated compression algorithm X to a
`data file A and obtaining a compressed file of just 1 bit! Any method that can compress
`an entire file to a single bit is excellent. Even if the compressed file is 2 bits long, we still
`consider it to be highly compressed, and the same is true for compressed files of 3, 4, 5
`and even more bits. It seems reasonable to define efficient compression as compressing
`a. file to at most half its original size. Thus, we. may be satisfied (even happy) if we
`discover a method X that compresses any data file A to a file B of size less than or
`equal half the size of A.
`
`The point is that different files A should be compressed to different files B, since
`otherwise decompression is impossible. It method X compresses two files 0 and D to the
`
`
`
`Comcast - Exhibit 1010, page 12
`
`Comcast - Exhibit 1010, page 12
`
`
`
`The conclusion is that method X can compress efficiently just a small fraction of
`all the files A. The great majority of files A cannot be compressed efficiently since they
`are random or Close to random.
`
`This sad conclusion, however, should not cause any reader to close the book with
`a sigh and turn to more. promising pursuits. The interesting fact is that out of the 2”
`files A, the ones we actually want to compress are normally the ones that compress
`well. The ones that compress badly are normally random or close to random and are
`therefore uninteresting, unimportant, and not candidates for compression, transmission,
`or storage.
`
`The daysjust prior to marriage are like a
`snappy introduction to a tedious book.
`
`Introduction
`
`same file E, then the X decompressor cannot decompress E. Once this is clear, it is easy
`to show that method X cannot compress all files efiiciently. If file A is n bits longr then
`there are 2" different files A. At the some time. the total number of all files B Whose size
`
`is less than or equal half the size of A is 214"”2 —— 2. For n = 100. the number of A files
`is NA : 2100 a 127-1030 but the number of B files is only N = 21+50 — 2 z 2.25.1015.
`The ratio NB /NA is the incredibly small fraction 1‘77 -10_15. For larger values of n. this
`ratio is much smaller.
`
`7
`
`Wilson Mizner
`
`Comcast - Exhibit 1010, page 13
`
`Comcast - Exhibit 1010, page 13
`
`